2.002: meh – ibm really, really doesn't get flash
Someone sent me this today:
And I have been trying so hard not to be The Storage Antagonist ;-}
Word to the wise, though – if you don't understand something, don't blog about it as if you do.
I've tried to get IBM's Tony Pearson to understand this repeatedly over the years, and he just keeps making the same mistakes. Probably has him despising me as much as that other blogger with the same first name, because every time he slips up, I'm usually there to correct him before his misinformation gets any traction.
This week TonyP is trying to wax intelligent on Flash Drives for the DS8K, but in his attempts to discredit my previous post, he removes any lingering doubt that IBM doesn't "get" flash.
Be sure to take the time to read the comments, and you'll see that TonyP clearly didn't take the time to understand the STEC ZeusIOPS drive or its wear-leveling algorithms. As a result, he pretty much embarrasses himself and his employer (not to mention the IBM Distinguished Engineers he throws under the bus) in the process.
At least he didn't try to drag Master Scientist BarryW down with him!
So, knowing that TonyP wouldn't dare to actually do the math for his readers, I will…
hey tony! here are the answers to the quiz!
Using the architectural definitions and modeling tools for the STEC ZeusIOPS wear-leveling algorithms and assuming that the SLC NAND flash will tolerate exactly 100,000 Program/Erase (P/E) cycles, the math says that the latest version of the 256GB (raw) STEC ZeusIOPS drive will wear out below it's rated usable capacity when exposed to a 100% 8KB write workload with 0% internal cache hit at a constant arrival rate of 5000 IOs per second in 4.92 years when configured at 200GB, and in 8.91 years configured at 146GB (yeah, I was off by .08 years).
Unfortunately, I cannot share the actual data or spreadsheet used to compute these numbers because they contain STEC proprietary information about their architecture and wear-leveling algorithms. So you'll have to trust me on this, and trust that IBM and EMC are in fact using the same STEC drives with the identical wear-leveling algorithms, just formatted at different capacities.
At a mix of 50/50 Read/write, the projected life of the drive is 9.84 years @ 200GB, and 17.8 years @ 146GB. And for what TonyP asserts is the "traditional business workload" (70% read / 30% write) the projected life expectancy is a healthy 16 years @200GB and 30 years @146GB.
Now, that's long enough for the drives to be downright ancient - more likely they will have been replaced with newer/faster technology long before the drive is even half-through its P/E life expectancy under those conditions.
So in the Real World that we all actually live in, nothing is ever 100% write – even database logs (which are not recommended for Flash drives) will not typically generate a 100% constant write workload at max drive IOPS. And the current generation of SLC NAND has been observed to easily exceed 100,000 P/E cycles, so even the above numbers are extremely conservative.
No, the truth is, the difference between the projected life at 146GB and 200GB on a 256GB (raw) ZeusIOPS is truly insignificant...and your data is no more at risk for the expected life of the drive either way.
Unless, of course, your array can't adequately buffer writes or frequently writes smaller than 8k blocks which will drive up the write amplification factor...two issues I suspect the DS8K in fact suffers from. Which, of course, would explain why IBM's Distinguished Engineers wouldn't want to take the risk with the DS8K. They don't get to be DEs by leaving things to chance, to be sure.
Symmetrix, on the other hand, isn't subject to these risk factors. Writes are more deeply buffered and delayed by the larger write cache of Symmetrix (DS8K is limited to 4GB or 8GB of non-volatile write cache vs. 80% of 256GB on DMX4 and 80% of 512GB on V-Max). Symmetrix writes are always aligned to the ZeusIOPS' logical page size to minimize write amplification, and the P/E cycles experienced by the NAND in the drive is proactively monitored to enable pre-emptive replacement should a drive exhibit premature or runaway wear-out.
Not so the DS8K, apparently…hence the conservative approach.
But don't be fooled – the deficiencies of the DS8K mean you will pay more Dollars per Usable GB of SSD on a DS8K than for Symmetrix DMX4 or V-Max EFDs.
Oh – and for the record TonyP, I don't think I ever said EMC was using a newer or different EFDs than IBM. I just asserted that EMC knows more than IBM about these EFDs and how they actually work in a storage array under real-world workloads. Thus, EMC are able to ship drives configured with more usable capacity per device without increasing the risks to customer data.
See, while IBM was playing catch-up,
EMC DID THE MATH!
technorati tags: EMC,Symmetrix,V-Max,DMX,Flash,EFD,SSD,NAND,IBM,DS8K,DS8000,STEC,ZeusIOPS,DMX4,Tony Pearson,performance,availability,wear-levelling,Program / Erase,P/E
Maybe DMX is only capable of an "arrival rate of 5000" - but double that and what happens.........
Since you've given SOME of the numbers... care to share with your readers what happens when you do 4KB or even 512 byte random writes for same periods of time with a higher arrival rate........
Especially as the "arrival rate" will be potentially much larger at smaller blocks, say how about even closer to the maximum the device can sustain?
Or would you rather avoid that discussion again...
And it is EVEN MORE relevant with your 'FAST' stuff, since you will be using it for all the slow disks, and so should be aiming to sustain every single IOP (arrival rate) you can from the devices - but yet again, maybe DMX can't manage that?
TAKE ALL THAT INTO ACCOUNT and maybe the truth is IBM understands more than you can imagine, and of course suggests that your latest attack at Tony and IBM is unjustified and infact requires an apology.
Posted by: Barry Whyte | May 13, 2009 at 05:35 PM
Are you suggesting that the ZeusIOPS can accept 8KB WRITES at an arrival rate of 10,000 per second when under a real world, mixed Read/Write workload?
Personally, I think 5000 8KB write IOPS is a reasonable expectation for today's ZeusIOPS.
But just how many 512 byte writes per second do you want to claim the ZeusIOPS can do given a 70/30 read/write mix? Certainly not enough to saturate the max MB/s bandwidth of the drive?
And tell us, oh wise one, when you give the drive a totally random, unaligned 512 byte workload with 0% cache hit rate inside the drives SDRAM, what is the EFFECTIVE write size within the drive? 4KB? 8KB? 16KB?
And lo, if my suspicions are correct, the DS8K cannot protect the flash drives from pathological writes, and does not even ATTEMPT to monitor the amount of wear, hence the conservative approach.
If you too want to claim superior knowledge, sir Master Scientist - please, do show us the math. All this CAPITALIZED BLATHER and insinuation without any actual data makes you no more credible than TonyP.
And don't assume to know anything about the actual FAST implementation - you only know what's been made public, and that has intentionally been restricted.
But please - do show us the math...
Posted by: the storage anarchist | May 13, 2009 at 06:00 PM
Guys, why is everyone so concerned about how long will EFD drive last? My customers never buy less than 36 months maintenance and warranty for EMC boxes. CLARiiON is 36 months by default, and I believe that DMX4 and V-Max are too. Simply, if a drive fails, it's returned back to vendor and you get the new one.
Anyway, in 3 years, price for EFD drives will be much less, and customers using EFD right now will benefit from it very much during next 3 years. Technology will evolve in these 3 years too.
Why bother? :)
Posted by: calypso | May 18, 2009 at 03:54 PM