« 0.058: gotta getta life (line) | Main | 0.060: blinded by the light »

January 14, 2008

0.059: bold, fast and green

No, I'm not talking about Kermit the Frog or a souped-up Kawasaki.

Nope, I'm talking about EMC's introduction today of Flash Solid-State Drives for Symmetrix DMX-4 - the first-and-only enterprise-class application of Flash technology.

Now, if you've already read Chuck's blog post (The Enterprise Strikes Back) and Mark's early-morning coverage (Enterprise Flash for DMX-4), you should have a pretty good understanding of who needs these things and why, and on the technology itself. No need for me to rehash that ground. And even Stephen-the-Packrat has noticed that What's Old is New Again, reinforcing the significant differences from these enterprise-flash drives and the stuff that Apple slaps into its iProducts. Oh, and here's the obligatory link to the original WSJ "scoop" on today's news.

Since I have had a front-row seat to the accelerated evolution of this technology into what today is a truly enterprise-ready solid-state storage solution, I thought I'd share a little about the journey that has brought us to this point.

Sorry for the delayed posting. I've been technical reference support for the SSD part of today's launch, which kept me pretty busy all day. Judging by the nature of the questions and the early coverage, this Flash thing seems pretty hot (pardon the pun). I'll cover the rest of today's Symmetrix announcements in a separate post.

Oh, and if you haven't stopped by The New EMC.com, you definitely should - it's a whole new experience. Today's announcement landing page is an excellent example of how the new technologies behind EMC.com provide a more rich and engaging approach to the company's web presence.

flash didn't happen overnight

You may be surprised to learn that this all started long before there was much public discussion of using Flash Memory-based solid-state drives, and predates even the emergence of commercial NAND Flash drives from M-systems (now SanDisk), Samsung, STEC Inc. and others back in the 2005-2006. I won't go so far as to imply that EMC was planning for this from the very beginning, as both Mark and Stephen have suggested, but I will say that the underlying architecture of Symmetrix has proven instrumental to the rapid adoption, integration and efficiency of this new technology.

Fact is, EMC has been preparing for the inevitable emergence of practical and cost-effective solid-state devices since even before the initial Symmetrix DMX even shipped. Back then, nobody could predict when Flash SSDs might be cheap enough to replace the spinning rust the storage industry is currently built upon. But the engineers and architects designing Symmetrix knew that it would happen one day, and that the Symmetrix DMX generation needed to be ready for it when it did.

And there were quite a few things that could be predicted about persistent solid-state storage devices, even back then. For example, it was pretty clear we wouldn't be faced with the performance and reliability limitations of USB thumb drives or the CF flash cards that were the then de-facto standard. And although we did imagine a "box-o-flash" made out of hundreds of those CF cards, we all pretty much expected that the most practical packaging (at least initially) would be in the form factor of a disk drive, complete with a standard drive interface (be it Fibre Channel, SATA or SAS). Even though higher densities and perhaps even greater performance could be achieved in a more customized form factor, there's an entire industry that knows how to develop software, qualify and support "disk drives." So, like the early days of RAID, the engineers planned on integrating a "standard" device rather than hand-crafting a new one.

Back then most people didn't think much about efficient energy use within the data center, but it was obvious that an SSD that didn't need any energy to maintain persistence would have an advantage over both SDRAM-based solutions and electro-mechanical hard drives. That energy efficiency is Topping the CIO's Most Wanted List is an added bonus of this whole endeavor.

Of course everyone knew that Flash SSDs would be fast - significantly faster than hard drives, although still slower than SDRAM. Back then, write performance was still expected to be slow (as is true today for the laptop SSDs that have been getting most of the attention of late), so it was expected that the write caching of Symmetrix would be of benefit (it is, but with the performance of the STEC drives, write caching will actually play a different, as I'll explain later).

But with sub-millisecond response times from the SSDs themselves, the role of Symmetrix read cache and pre-fetch algorithms would undoubtedly change. Algorithms designed to pre-fetch sequential reads, leverage "free reads"  and take advantage of referential locality that occurs in most "random" workloads would be practically unnecessary with drives than could respond to ANY I/O request without the latency of head movement or rotational positioning.

And it was pretty clear that the initial enterprise-class SSDs would be very expensive and lower in capacity per device as compared to hard drives, meaning that early adopters probably wouldn't be able to even consider an all-Flash storage array - Symmetrix was going to have to run Flash alongside normal hard drives in the same array, at least in the beginning.

Each of these predictions directly influenced the evolution of Symmetrix DMX and Enginuity (the Symmetrix operating system software, aka microcode), and both have been being optimized for the eventuality of today's announcements for the past several years.

a key component of in-the-box tiered storage

In fact, Flash has been a key component of the Symmetrix in-the-box strategy for tiered storage from the start, although until today it probably appeared this strategy was focused only on the lower SATA-end of tiered storage.

Not so, grasshopper.

No, in fact, virtually every one of the key enhancements we've added to Symmetrix DMX since 2003 have been in preparation for Flash SSDs. Oh sure, they've had immediate benefit for spinning rust disk drives as well, but they're all really pre-requisites for bringing Flash SSD technology to the massively-consolidated, performance-intensive, yet cost-conservative enterprise storage market.

Let's take a quick review of some of the externally-visible things that have been added to Symmetrix and Enginuity since 2003 and how they apply to Flash SSDs:

  • RAID5: while "mirror everything" used to be the way-of-Symmetrix, you just can't justify the cost for every application any more, and it's probably overkill for enterprise Flash SSDs anyway. (So is RAID 6, but those were added pretty much just for the fat-and-slow SATA drives).
     
  • TimeFinder/Snaps: Space-saving snapshots. With the cost of SSD, you don't want to make any more copies of your data than you need to. The recent Asynchronous Copy on First Write enhancements in 5772 ensure that the Snaps have minimal impact on the response times of the primary volumes on the Flash SSD.
     
  • Modular Packaging: Symmetrix DMX-3 and DMX-4 are "enterprise-modular" arrays, allowing for almost unlimited flexibility of configuration - you can have one "quadrant" supporting as many as 600 drives for maximum capacity, or you can have a quadrant optimized for performance with as few as 32 drives. This approach now lets you dedicate a quadrant to Flash SSDs to maximize their performance (you'll still need the 32 regular disk drives in that quadrant to support DMX's PowerVault, but you can use the space on those drives for other things as well).
     
  • Cache Partitioning: With Flash SSDs, you don't really need a lot of cache for reads, but you do want to have a modicum of cache for pending writes (I'll explain why in a moment). In an interesting twist, you might actually want to decrease cache to a bare minimum for read-intensive applications. Dynamic Cache Partitioning helps to ensure that your memory is used where it's needed most, even as the system dynamically reallocates based on actual workloads.
     
  • Symmetrix Priority Controls: Similarly, you want to be sure that the Flash drives receive the appropriate relative priority to everything else in the system, and internally, Enginuity uses the underlying mechanisms to protect "normal" disk drives from starvation caused by the hyper-responsive SSDs.
     
  • Virtual Provisioning: This one's probably obvious, but with the cost of SSDs, you really want to buy as little of it as possible so thin provisioning is almost imperative to maximize utilization. Over-provisioning allows for future growth with a minimum of hassle - just add another group of SSDs to the pool before expanding your databases.
     
  • Switched Infrastructure: In addition to the inherent fault-isolation and reliability improvements afforded by the new point-to-point DMX-4 back-end, it also serves to minimize the latency overhead for the Flash SSDs. While the overhead of an arbitrated loop is minimal and practically undetectable for a regular hard drive, even a little latency is noticeable with SSDs. And if/when future enterprise-class SSDs hit the market with a SATA interface instead of Fibre Channel, the DMX-4 will be ready.
     
  • Asynchronous Replication: while clearly justifiable on the merits of being able to replicate data a significantly longer distance than possible with synchronous replication, asynchronous replication is expected to be the preferred method of protecting data stored on Flash SSDs, for a very simple reason: after you've paid to attain minimal response times, the last thing you're probably going to do is add another millisecond or two of transmission time to your writes.
     
  • SRDF/S Response Time improvements in 5772: But if your application DOES require synchronous replication, you'll want the fastest possible response times, so the enhancements made in 5772 (also in the upcoming 5773) could well make a lot of difference for Flash SSDs.

the real magic is inside

The real key to the "enterprise-ness" of the STEC Flash SSDs that EMC announced today is pretty much equally divided between the drive itself and the optimizations that have gone into Enginuity in preparation for them.

inside the drives

As you've probably read by now, the STEC ZeusIOPS drives themselves are in fact optimized for random AND sequential I/O patterns, unlike the lower cost flash drives aimed at the laptop market. They use a generously sized SDRAM cache to improve sequential read performance and to delay and coalesce writes. They implement a massively parallel internal infrastructure that simultaneously reads (or writes) a small amount of data from a large number of Flash chips concurrently to overcome the inherent Flash latencies. Every write is remapped to a different bank of Flash as part of the wear leveling, and they employ a few other tricks that I've been told I can't disclose to maximize write performance. They employ multi-bit EDC (Error Detection) and ECC (Error Correction) and bad-block remapping into reserved capacity of the drives. And yes, they have sufficient internal backup power to destage pending writes (and the mapping tables) to persistent storage in the event of a total power failure.

(Of course, in a Symmetrix, they'll be powered by the integral standby power through any momentary power outages or through an orderly shutdown in a total power fail situation. But it's nice to know there's a backup to the backup).

Perhaps the most oft-repeated questions have been about Flash wear out. To my knowledge, there isn't a drive-level rating for these drives (yet). But even at the rated minimum 100,000 writes per cell, we know that they'll endure several years for all practical use cases except perhaps a pathological 100% pure random write stress test (which would probably kill a hard drive sooner than an SSD). And experience shows that SLC flash will handle significantly more writes than the rated minimum and there are SLC NAND flash parts coming soon that are even more resilient.

Bottom line: It makes no sense for EMC to sell a drive that would cost them time, money or reputation, so I'm pretty much convinced by the mathematicians who tell me not to lose any sleep over this.

inside enginuity

On the Enginuity side, several optimizations have been made to maximize performance and extend the life of the drives.

One good example is the aforementioned write caching. With effective write performance that pretty much matches read latencies, there's not a lot to be gained - performance wise, that is - by caching writes to the disk. BUT, buffering writes can help reduce the wear and tear on the drive. See, the longer Enginuity can delay sending writes to the drive, the higher the probability that a subsequent write supercedes an earlier one. This "write folding" is a key foundation of reducing the amount of data SRDF/A has to transmit, and it will have a similar effect on reducing the amount of "writes" a Flash SSD has to deal with.

Other enhancements minimize code paths when the source of a read is an SSD, skipping any logic to determine whether pre-fetch algorithms should be engaged. The aforementioned "free reads" are skipped outright, knowing that the drive itself has already fetched the "rest of the track" into its SDRAM buffer should it be needed. No need for re-ordering I/Os, either, since there's no rotational latency or seek times to optimize. And the kid gloves are removed for the rare drive rebuild as well, effectively rebuilding ALL the hypers at once instead of sequentially, since there's no real performance difference or overhead for totally random vs. sequential requests.

And then there's the logic to ensure that "lesser drives" aren't starved in the never-ending quest for performance, leveraging the underpinnings of the Priority Controls feature. And on the "image that" front, it turns out that these drives can actually complete an I/O request before the requesting code was ready to accept a response. Code written to handle the tortoise-like drives of the 90s would stand no chance with these drives; fortunately this was expected and the code was easily optimized around this new operational model.

And there's more, but I'll leave them for another day...I've delayed this post long enough, and the details can wait.

a new day has dawned

So, as this First Day of the New Era of Storage winds down, I have to say that the positive responses and near universal support for the significance of today's announcement are appreciated and welcomed. And though it sounds self-serving and pompous, I personally believe that Flash SSD technology will become ubiquitous in the not-so-distant future as prices decline and performance improves (thanks to  Moore's Law) at rates unheard of in the entire history of the hard disk drive.

The simple fact is nobody expected this announcement to happen until probably 2009 at the earliest. The competition has been caught flat-footed, and EMC has a head-start in addressing a very real (and previously unmet) customer requirement. That can't be popular with the anti-EMC crowd, but then, they don't buy much EMC storage anyway. What *IS* important is that there are more customers than the competition wants to admit that need the response times that these drives can deliver, and they need it NOW! And while I don't know for sure, I am guessing that EMC's direct, channel and partner sales folks around the globe have met more of those interested customers today than even they thought existed.

(And for those of you who carry a bag for EMC, I second DaveD's advice that you should get out and talk to every single one of your customers about this technology TODAY. I'm betting you'll find more interest than you can imagine. And if not, heck, it gives you something new to talk to your customers about. So get out there!)

Inevitably, customer demand will drive today's nay-sayers and sideline sitters to follow the path the EMC is trailblazing, just like they did over 17 years ago when EMC introduced the first Intelligent Cached Disk Array (ICDA), and created a whole new market for external storage where none existed before.

And trust me, they will follow, no matter what FUD and misdirection they may spout while they're trying to catch up.

If I may be so bold.
 

PostScript

Thanks for understanding my silence on this subject up until today.

There are many of you who read and participate in my blog who have been scratching around the edges of today's announcement on multiple occasions, both here and in other storage blogs, and I can only offer my heartfelt apologies that I was unable to engage or respond on this subject until now.  As you can now probably deduce, everyone within EMC who knew anything about this was sworn to absolute silence (even 'Zilla, who sniffed this out early last summer). And as surprised as many are that we actually pulled this "cone of silence" thing off, I believe that silence served to focus people's efforts and actually helped to accelerate the project.

But now that it has been announced, I am hopeful that we can engage in productive discussion about the technology, its future and its practical applications. Collaboration will only help to accelerate the economics and the adoption of this exciting new technology, and I sincerely look forward to the dialog. For my part, I'll endeavor to answer any questions that I can on the subject.


TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d834c659f269e200e54fdebce68833

Listed below are links to weblogs that reference 0.059: bold, fast and green:

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Dave Vellante

I woke up this morning to a ton of fresh powder and a solid state delight (SSD). My good friend Fred Moore, who introduced me to the screaming fast world of solid state disk sent me a note with a fun fact that I'd like to share:
http://wikibon.org/A_brief_history_of_solid_state_disks

The first SSD came on the market in 1978 at $8.8M/GB!

Barry Whyte

Congrats, but a couple of questions.

1) How many of these are you actually able to support in a DMX. I don't mean how many will you sell to customers, but how many can you actually fully utilise at 4K reads and writes? And still leave some MIPs for normal HDDs?

2) Is the 10% price figure per SSD vs HDD??? Surely not, I'm guess its 10% over and above the cost of a fully configured DMX? Or are you 'giving' the hardware away and selling 'licenses' for their use?

3) What does the $/GB price work out at when including the DMX frame maxed out with just SSDs.


Lets assume a complete DMX can handle 10 of these drives (15 at a push?) Thats hardly green when you need a 19" rack full of power hungry processors, memory, PSUs, ports etc to saturate 10 to 15 drives. Not the most efficient packaging solution.

Anyway, as I've already played with one or two of these I know what they are capable of, and I understand for now its a case of make use of what we can - but as I commented back then on my blog, these really do start to turn the industry standard 'controller requirements' on their head.

the storage anarchist

Dave - thanks for the history. Good news is the price has come down since then. A little, anyways ;)

But you may want to double-check the Flash SSD read/write latencies that Fred claims. STEC claims Flash SSD device level reads more around .250ms and writes around .400ms. Maybe it's just the difference between component-level and device-level measurements...but I don't think I've seen anyone claim that NAND Flash reads are the same speed as SDRAM, nor writes as slow as a 15K HDD.

That's not what we're seeing in the labs, to be sure.

the storage anarchist

BarryW - Good questions, I'll answer as best I can.

1) Flash SSDs don't increase the number of IOPS or MB/s that the array can handle, they just get to that mark with fewer drives. So while there is no technical limit on the number of SSD's you can put into a Symmetrix, max performance/throughput is acheived with relatively few drives. The 30-15Ks to 1-SDD ratio is a pretty reasonable comparison. So if your application would max-out a full DMX-4 4500 with 960 73GB 15K drives (which is not necessarily unreasonable), you could get the about same performance with only 32 SSD's. I think the qualified max drives at GA is 128, but that's more due to the sheer cost of getting enough drives to do the qualification - if customers want more, they'll find a way to qualify them :)

2) I think the 10% figure was mis-interpreted somewhere...the price figure I was given was that a 73GB FSSD would deliver the IOPS of ~30 73GB 15K's and the unit price of that FSSD was ~30x the 73GB 15K HDD.

Admittedly expensive on a $/GB basis, but price-neutral on a $/IOPS basis, and about 10x better response time than is acheivable even with HDDs in a cached disk array. That's a HUGE benefit, especially if you also need your "Tier 0" app to share the rest of Symmetrix functionality (replication, enterprise-wide consistency groups, virtual provisioning, etc.) and have a common management interface, and you'd like it all in one box, supported by a single vendor (no finger pointing).

3) I can't provide actual pricing, directly or indirectly, sorry.

The "Green" claim is quite defensible - you just have to consider Watts/IOP instead of Watts/GB.

If your application needs the IOPS, you really can reduce your energy usage (for the storage component of the array, which is the majority of the total energy usage) by 98%. At 30-to-1 reduction in spindles in a DMX-4, each group of 8 SSD's eliminates the need for 30 disk drives PLUS at least 1/2 of a DMX storage bay, which includes 8ea 15 drive cages plus the power supplies and battery backup they would require.

But you touch on an interesting point - how do you measure the "greenness" of performance or functionality. I mean, on a Watts/GB, JBOD will always win - how do we factor in the value of all the stuff we wrap around JBOD to make it practical for the intended application? I'd love to hear your thoughts.

Barry Whyte

Thanks for the answers Barry. My pricing questions were kind of rhetorical. Glad to see that you agree the array itself is the limiting factor. What I've been looking at is extracting the array limits away and really getting the most bang for the buck from these drives.

Interesting too that although DMX supports 4Gbit FCAL, these are only available 2Gbit thus far from STEC, so on throughput tests will only get about 2 drives usage to max out the FCAL loop. Still as you say its more about IO/s and latency rather than anything else.

As for the green issue, I would see the entire 'greenness' as the solution and not just the components. So even if a vendor came out with a 0W/H storage device, but then had to wrap a monster around it, its still a monster. Again this comes back to the points about how best to make use of new emerging technologies that today's big box monolothic controllers weren't designed for. I have been thinking about this lots, but obviously like you I am not at liberty to discuss.

All the best with this anyway.

the storage anarchist

The challenge with optimizing for "greenness" at the solution level is assigning a value to the "monster" - if the solution requires the features that only the monster provides, then the monster is an acceptable price to pay, both in dollars and in KiloWatts.

I've observed that there seems to be a correlation between the amount of hardware in a solution and the amount of energy it consumes - the more components, the more Watts to run them. So given that virtually every corner of the storage market has price competition, Darwin's theory ensures that the "monsters" are being built at the lowest possible cost, with a corrolary benefit on power. Or they die (insert obligatory DS8000 lifecycle FUD here).

Sure, Flash could well be a disruptive technology, with the potential for eliminating the need for large caches and complex I/O scheduling algorithms. But emerging requirements like thin provisioning, data compression, encryption, de-dupe and even MAID will all require more CPU cycles and memory, not less, IMHO.

All this may indeed drive changes away from the "Big Box" approach. Or maybe the Big Box will morph into something else - I won't try to predict. But at least we know one thing - the future's going to be interesting :)

Shibin Zhang

I followed EMC technology VP's Blog link and found this article. I have one more comment.

If you think deeper, you might see much more use cases of enterprise flash disks even in the near future. I think EMC must also promote those use cases in order to promote this new technology. I have seen at least one (I can't talk about the detail because I am looking for little seed fund to prove it), which requires not only high performance but also consistent performance and I think enterprise flash disk is a good fit.

mark sanford

Barry, this is a fantastic technology and EMC is really on to something here. One of the biggest factors that I have not seen marketed by EMC is software cost. As license for data base application are sold by the number of CPUs they would run on, being able to get 2 to 3 x the performance on Oracle, SQL or other could save the companies using this technology more than the cost of the DMX4.
When do you expect mass adoption of EFDs in DMX4?
As price of EFD can be big inhibitor for many applications, when do you expect to have second source for EFD (I think EMC uses STEC for now as a single source) used on the DMX4 (Pliant, Sandforce, Intel, Samsung, Sandisk etc...) to be able to substantially lower prices?
If each EFD replaces 30 units of 15K RPM drive, when do you think the number of 15K RPM drives would drop enough to the point it would not make sense for Seagate, Hitachi and Fujitsu to make them anymore?
For a market of 30M drives / year, it would take less than 1M EFD SSD to wipe out the industry and devastate many of the HDD manufacturers, especially Seagate!!!

The comments to this entry are closed.

anarchy cannot be moderated

about
the storage anarchist


View Barry Burke's profile on LinkedIn Digg Facebook FriendFeed LinkedIn Ning Other... Other... Other... Pandora Technorati Twitter Typepad YouTube

disclaimer

I am unabashedly an employee of EMC, but the opinions expressed here are entirely my own. I am a blogger who works at EMC, not an EMC blogger. This is my blog, and not EMC's. Content published here is not read or approved in advance by EMC and does not necessarily reflect the views and opinions of EMC.

search & follow

search blogs by many emc employees:

search this blog only:

 posts feed
      Subscribe by Email
 
 comments feed
 

 visit the anarchist @home
 
follow me on twitter follow me on twitter

TwitterCounter for @storageanarchy

recommended reads

privacy policy

This blog uses Google Ads to serve relevant ads with posts & comments. Google may use DoubleClick cookies to collect information (not including your name, address, email address, or telephone number) about your visits to this and other websites in order to provide ads about goods and services of interest to you. If you would like more information about this practice and your options for not having this information used by Google, please visit the Google Privacy Center.

All comments and trackbacks are moderated. Courteous comments always welcomed.

Email addresses are requested for validation of comment submitters only, and will not be shared or sold.

Use OpenDNS