« 2.001: ibm's amazing splash dance, part deux | Main | 2.003: sgt. friday and the ibm flash competency debate »

May 12, 2009

2.002: meh – ibm really, really doesn't get flash

Someone sent me this today:

Blogger at a BarAnd I have been trying so hard not to be The Storage Antagonist ;-}


Word to the wise, though – if you don't understand something, don't blog about it as if you do.

I've tried to get IBM's Tony Pearson to understand this repeatedly over the years, and he just keeps making the same mistakes. Probably has him despising me as much as that other blogger with the same first name, because every time he slips up, I'm usually there to correct him before his misinformation gets any traction.

This week TonyP is trying to wax intelligent on Flash Drives for the DS8K, but in his attempts to discredit my previous post, he removes any lingering doubt that IBM doesn't "get" flash.

Be sure to take the time to read the comments, and you'll see that TonyP clearly didn't take the time to understand the STEC ZeusIOPS drive or its wear-leveling algorithms. As a result, he pretty much embarrasses himself and his employer (not to mention the IBM Distinguished Engineers he throws under the bus) in the process.

At least he didn't try to drag Master Scientist BarryW down with him!

So, knowing that TonyP wouldn't dare to actually do the math for his readers, I will…
 

hey tony! here are the answers to the quiz!

Using the architectural definitions and modeling tools for the STEC ZeusIOPS wear-leveling algorithms and assuming that the SLC NAND flash will tolerate exactly 100,000 Program/Erase (P/E) cycles, the math says that the latest version of the 256GB (raw) STEC ZeusIOPS drive will wear out below it's rated usable capacity when exposed to a 100% 8KB write workload with 0% internal cache hit at a constant arrival rate of 5000 IOs per second in 4.92 years when configured at 200GB, and in 8.91 years configured at 146GB (yeah, I was off by .08 years).

Unfortunately, I cannot share the actual data or spreadsheet used to compute these numbers because they contain STEC proprietary information about their architecture and wear-leveling algorithms. So you'll have to trust me on this, and trust that IBM and EMC are in fact using the same STEC drives with the identical wear-leveling algorithms, just formatted at different capacities.

At a mix of 50/50 Read/write, the projected life of the drive is 9.84 years @ 200GB, and 17.8 years @ 146GB. And for what TonyP asserts is the "traditional business workload" (70% read / 30% write) the projected life expectancy is a healthy 16 years @200GB and 30 years @146GB.

Now, that's long enough for the drives to be downright ancient - more likely they will have been replaced with newer/faster technology long before the drive is even half-through its P/E life expectancy under those conditions.

So in the Real World that we all actually live in, nothing is ever 100% write – even database logs (which are not recommended for Flash drives) will not typically generate a 100% constant write workload at max drive IOPS. And the current generation of SLC NAND has been observed to easily exceed 100,000 P/E cycles, so even the above numbers are extremely conservative.

No, the truth is, the difference between the projected life at 146GB and 200GB on a 256GB (raw) ZeusIOPS is truly insignificant...and your data is no more at risk for the expected life of the drive either way.

Unless, of course, your array can't adequately buffer writes or frequently writes smaller than 8k blocks which will drive up the write amplification factor...two issues I suspect the DS8K in fact suffers from. Which, of course, would explain why IBM's Distinguished Engineers wouldn't want to take the risk with the DS8K. They don't get to be DEs by leaving things to chance, to be sure.

Symmetrix, on the other hand, isn't subject to these risk factors. Writes are more deeply buffered and delayed by the larger write cache of Symmetrix (DS8K is limited to 4GB or 8GB of non-volatile write cache vs. 80% of 256GB on DMX4 and 80% of 512GB on V-Max). Symmetrix writes are always aligned to the ZeusIOPS' logical page size to minimize write amplification, and the P/E cycles experienced by the NAND in the drive is proactively monitored to enable pre-emptive replacement should a drive exhibit premature or runaway wear-out.

Not so the DS8K, apparently…hence the conservative approach.

But don't be fooled – the deficiencies of the DS8K mean you will pay more Dollars per Usable GB of SSD on a DS8K than for Symmetrix DMX4 or V-Max EFDs.

 

Oh – and for the record TonyP, I don't think I ever said EMC was using a newer or different EFDs than IBM. I just asserted that EMC knows more than IBM about these EFDs and how they actually work in a storage array under real-world workloads. Thus, EMC are able to ship drives configured with more usable capacity per device without increasing the risks to customer data.

See, while IBM was playing catch-up,

EMC DID THE MATH!


This post is from the storage anarchist. 


TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d834c659f269e201156f8cc27b970c

Listed below are links to weblogs that reference 2.002: meh – ibm really, really doesn't get flash:

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Barry Whyte

Maybe DMX is only capable of an "arrival rate of 5000" - but double that and what happens.........

Since you've given SOME of the numbers... care to share with your readers what happens when you do 4KB or even 512 byte random writes for same periods of time with a higher arrival rate........

Especially as the "arrival rate" will be potentially much larger at smaller blocks, say how about even closer to the maximum the device can sustain?

Or would you rather avoid that discussion again...

And it is EVEN MORE relevant with your 'FAST' stuff, since you will be using it for all the slow disks, and so should be aiming to sustain every single IOP (arrival rate) you can from the devices - but yet again, maybe DMX can't manage that?

TAKE ALL THAT INTO ACCOUNT and maybe the truth is IBM understands more than you can imagine, and of course suggests that your latest attack at Tony and IBM is unjustified and infact requires an apology.

the storage anarchist

Are you suggesting that the ZeusIOPS can accept 8KB WRITES at an arrival rate of 10,000 per second when under a real world, mixed Read/Write workload?

Personally, I think 5000 8KB write IOPS is a reasonable expectation for today's ZeusIOPS.

But just how many 512 byte writes per second do you want to claim the ZeusIOPS can do given a 70/30 read/write mix? Certainly not enough to saturate the max MB/s bandwidth of the drive?

And tell us, oh wise one, when you give the drive a totally random, unaligned 512 byte workload with 0% cache hit rate inside the drives SDRAM, what is the EFFECTIVE write size within the drive? 4KB? 8KB? 16KB?

And lo, if my suspicions are correct, the DS8K cannot protect the flash drives from pathological writes, and does not even ATTEMPT to monitor the amount of wear, hence the conservative approach.

If you too want to claim superior knowledge, sir Master Scientist - please, do show us the math. All this CAPITALIZED BLATHER and insinuation without any actual data makes you no more credible than TonyP.

And don't assume to know anything about the actual FAST implementation - you only know what's been made public, and that has intentionally been restricted.

But please - do show us the math...

calypso

Guys, why is everyone so concerned about how long will EFD drive last? My customers never buy less than 36 months maintenance and warranty for EMC boxes. CLARiiON is 36 months by default, and I believe that DMX4 and V-Max are too. Simply, if a drive fails, it's returned back to vendor and you get the new one.

Anyway, in 3 years, price for EFD drives will be much less, and customers using EFD right now will benefit from it very much during next 3 years. Technology will evolve in these 3 years too.

Why bother? :)

The comments to this entry are closed.

anarchy cannot be moderated

about
the storage anarchist


View Barry Burke's profile on LinkedIn Digg Facebook FriendFeed LinkedIn Ning Other... Other... Other... Pandora Technorati Twitter Typepad YouTube

disclaimer

I am unabashedly an employee of EMC, but the opinions expressed here are entirely my own. I am a blogger who works at EMC, not an EMC blogger. This is my blog, and not EMC's. Content published here is not read or approved in advance by EMC and does not necessarily reflect the views and opinions of EMC.

search & follow

search blogs by many emc employees:

search this blog only:

 posts feed
      Subscribe by Email
 
 comments feed
 

 visit the anarchist @home
 
follow me on twitter follow me on twitter

TwitterCounter for @storageanarchy

recommended reads

privacy policy

This blog uses Google Ads to serve relevant ads with posts & comments. Google may use DoubleClick cookies to collect information (not including your name, address, email address, or telephone number) about your visits to this and other websites in order to provide ads about goods and services of interest to you. If you would like more information about this practice and your options for not having this information used by Google, please visit the Google Privacy Center.

All comments and trackbacks are moderated. Courteous comments always welcomed.

Email addresses are requested for validation of comment submitters only, and will not be shared or sold.

Use OpenDNS