0.003: beware the blogger: urban folklore at work
The trouble with tribbles (er) blogging is that you just never know who (or what) to believe. Too bad there isn't an automated BS Detector that could tell you in advance when something simply isn't true.
Usually, these misrepresentations result in little measurable or lasting damage. But occasionally they take on a life of their own. Left unchecked, they get repeated so often and with such conviction that people actually start to believe them.
In many (hopefully most) cases, such untruths may be unintentional and accidental. These usually get caught early, and before they are unwittingly repeated far and wide. I'm sure that sooner or later we all get caught in a misunderstanding or even the context-changing typo here and there.
But all too often, these untruths have every appearance of being intentional. There are those who stop at nothing to make a point, and who have no qualms in adjusting the facts to support their position. And others will resort to the age-old marketing tactic of comparing their TODAY product to the competition's YESTERDAY product in their quest to establish the perception of radical superiority.
And some people are so durn good at twisting truths that their assertions become part of the urban folklore sometimes faster than facts and realities do.
Who ya gonna call?
Forewarning: I'm a the consummate Symmetrix bigot, and for today's episode, I've taken the advice to "write about what you know" quite literally. My apologies in advance.
Disclaimer: The opinions expressed here are my personal opinions. Content published here is not read or approved in advance by EMC and does not necessarily reflect the views and opinions of EMC.
myth busters?
Despite the intensity of the pervasive urban folklore, today's Symmetrix is radically different than the one that EMC was built upon back in the early 90's. But it seems many of the so-called market experts are mired in how Symmetrix USED to work, how it WAS implemented, how expensive it was BACK THEN, the arrogance of the sales force. And on, and on, and on.
Truth be told, there's simply no way Symmetrix could have maintained its 11-plus year market share leadership if it didn't change and adapt to the customer and market needs. And the Symmetrix that Hitachi and IBM competes with today bears little resemblance to the ones they started out against in the late 1990's.
And I should know - I've been part of the team leading the change for more than half a decade.
myth-teries and myth-perceptions
For example, I'm sure you've heard about the notorious Symmetrix "BIN" file. Hu Yoshida likes to bring the BIN file up any time he's attacking the complexity of Symmetrix - usually in a sentence that includes one or more other derogatory terms like "static cache architecture," as in this recent entry in his blog:
When Mark talks about arrays and their limitation in scalability, he is speaking from his frame of reference, the DMX, which has a static cache architecture that requires BIN file changes whenever the storage configuration is changed.
There are several myths in this brief, and unsubstantiated, assertion:
- Symmetrix DMX cache architecture is anything BUT static - in fact, with the introduction of Tag-Based cache (in Enginuity 5670), cache is inherently a dynamically managed and assigned resource. Any piece of data can go in any cache slot, and cache slots are intentionally striped across memory boards to spread out the I/O workload (and in fact, each memory board can simultaneously handle multiple concurrent I/O streams, debunking yet another Hitachi-fueled myth).
With DMX-3, to improve availability, each memory board is indeed mated to a specific partner to form a mirrored pair. But this clearly isn't what Hu is referencing, since he's been asserting this mistruth since before the DMX-3 was introduced.
Across the entire DMX family, the allocation of data blocks to global memory is truly dynamic - so much so that you can add more cache to a running array and it will be automatically utilized. You can even remove a cache pair from a running array (with a little advance warning to the software beforehand so that it can relocate pending writes).
Fact is, the Symmetrix does use cache differently than our competition. And the architecture is designed to leverage MORE cache than our competitors can even install today. More memory helps improve cache hit rates, and allows the DMX to deliver higher performance with lower latency than our competitors' arrays. We think that justifies the implementation. - While there indeed exists a so-called BIN file to hold the system configuration for each Symmetrix, it is no longer typically manipulated directly - a Symmetrix DMX can be installed and operated for a lifetime with no-one ever typing or clicking on "<something>.BIN". Today customers can reconfigure virtually everything about their array - create/allocate/assign LUNs and volumes, set port flags, establish and split TimeFinder pairs, modify zoning, - all without ever directly accessing the BIN file.
And more importantly, today the BIN file is more dynamically updated, supports multiple concurrent operations against it, and can be reloaded faster than "your fathers' Symmetrix" of the pre-DMX days.
I'm reminded of the announcement for Windows NT (I think it was), when a chagrined Jim Allchin was chastised by the audience as the demo system rebooted live/on stage, spewing AUTOEXEC.BAT, CONFIG.SYS and SYSTEM.INI gobbledygook across the huge projection screen. A few short years later, when Windows 2000 was launched, the entire boot process had been hidden behind a splash screen. And it has hidden been ever since, although it wasn't until Vista that the boot process was totally overhauled.
The BIN file is in the midst of a similar transition, but it is already not the problem today that Hu would have you believe. - The biggest myth in this short passage is this: Mark's so-called "frame of reference" is hardly the Symmetrix - Symm isn't really his area of focus Oh, Mark understands Symmetrix quite well, I'm sure. But I can almost guarantee that he's never actually modified a Symmetrix configuration, much less seen a BIN file - it's just not in the domain of his "need to know."
- Finally, I seriously doubt that Hu has had the opportunity to operate and manage a Symmetrix DMX in the past few years, either. Yet reading his blog, you'd almost get the impression he's an expert on the inner workings of a Symmetrix somehow.
Who ya gonna call?
ghost busters?
It's not just Hu or even Hitachi that resort to these questionable marketing tactics. Our pals over at IBM assert many such myth-truths in their competitive marketing materials and sales pitches. Often, they use the tactic of appealing to your analytical instincts to make a point, rather than present real-world measurements and examples. If you believe it to be true, why bother presenting the facts?
For example, a frequent assertion from IBM is that the Symmetrix wastes cache and I/O bandwidth because we use a very large logical block size as our increment of I/O - the "64K byte track." IBM has even created fictitious benchmarks designed to artificially exploit this track size in their attempt demonstrate the inferiority of Symmetrix vs. the DS8000. And it almost makes sense - if we're reading 64K off the disk while IBM is reading only 8K, they should be 8x faster. Right?
Two myths for the price of one:
- Fact is, the DS8000 is nowhere near 8x faster than a DMX (and I mean any DMX, not just the DMX-3.
Symmetrix DMX doesn't actually read 32KByte tracks (pre-DMX3) or 64KByte tracks (DMX3 and onward). Instead, the code actually reads off the disk everything we can UP TO the track size, based on where the head is located within the track when we're ready to read. Might be the whole track, or it might be a few Kbytes at the beginning or end of what we really needed to fulfill the I/O request. We call these "free bytes," since we effectively can read them without any overhead - they're under the read head already and anyway - may as well bring them into cache.
And if things are really, really busy, we don't try to read "free bytes" at all. - Despite what seems intuitive, we've found that even in databases where you'd imagine there was little or no so-called "locality of reference", there is in fact a LOT. And in the real world, by pre-reading these "free bytes," Symmetrix cache hit ratios are consistently higher than the competition, and response times are correspondingly lower.
As with many things, it doesn't seem to make sense on paper, but sometimes things just don't work the way your logic says they do.
I won't belabor the point - but the fact is it can be hard to recognize dis-information - all too often, it appears logical, factual and it's presented in such a way that you forget that the motivations of the author might not be 100% aligned with the truth.
As I said in my introductory blog, I encourage you to join with the storage anarchist and the rest of the skeptics in the industry to keep us all honest and true to the facts.
They're Here to Save the World!


Comments