0.009.2: catch-22 part deux, redo
(OK, I blew it. In the original version of this entry, I somehow lost about a third of what I had written and edited. This is the entire entry, and I've deleted the original. Apologies to those who commented on the original version; feel free the chime in again. Thx - tsa)
In a recent blog entry, Hu Yoshida pronounces the benefits of thin provisioning with wide striping as if it were something new and innovative recently invented by Hitachi's corps of Japanese engineers.
Many of you (especially the OSG's) will recall that StorageTek Iceberg was the first practical implementation of thin provisioning, back in 1994. Although innovative and cutting edge (some would say daring), it never really caught traction, proving first-to-market isn't always an assurance of success. Garbage collection was its Achilles heel, although many believe the demise of Iceberg was more due to the fact that mainframes don't really need a lot of help to manage storage utilization effectively.
Since Iceberg, there have been more than a handful of companies who have tried to leverage thin provisioning into the magic sauce that cost-effectively improves utilization, simplifies allocation and reduces the demand for physical storage - all while maintaining predictably acceptable performance. But none of them have solved the Catch-22's of thin virtual provisioning that I discussed in my earlier blog entry.
Hitachi is the most recent (and exuberant) entrant in this long line of would-be storage alchemists, and frankly, there's nothing to indicate that they've done anything to differentiate their implementation from all the rest.
No, despite all the hype, this latest quest to deliver the virtual provisioning grail by the Hitachi Ltd. developers from the Land of the Rising Sun appears to do nothing to help customers break out of the virtual provisioning catch-22.
the very real challenges of imaginary storage
Now, truth be told, most of the issues that surround the get in the way of virtual provisioning aren't the fault of the implementations. Marc Farley does a good job of explaining this from one perspective, but I thought I might try to expand upon his explanation a bit.
See, most of the challenges with virtual provisioning have to do with the intentional independence of the file systems and database engines in the open systems environment from the underlying storage infrastructure. Having evolved from computer systems designed and built around a single core CPU and surrounded by "dumb" I/O controllers, these low-level data managers have been designed to handle storage allocation entirely on their own, with no external interface into what's really going on. In fact, from the perspective of a disk drive attached to an open systems host, it only sees I/O requests to "read block" or "write block" - virtually everything else is handled at a higher level., The disk drive (and any block-oriented open systems storage array) are literally blind to the operational intent of the file system. It simply has no way to know if a block is in use, or if it's part of a deleted file.
Contrast this with design of the mainframe environment, where the central processing unit complex passes all I/O requests to the channel adapters for handling using a higher-level protocol. Instead of "read block, write block", the mainframe's low-level I/O commands are more like "create dataset, move dataset, get data, put data, sort data, release data, delete dataset." Granted, these semantics are rooted in the storage constructs of punch cards, spinning magnetic tapes and core memory. But these semantics allow the application/operating system to tell the channel device how the data is being used and when it was no longer needed. Iceberg in fact could to leverage this additional awareness to provide a solution that was more robust then than virtually every block-oriented virtual provisioning alternative on the market today. For example, by monitoring the "scratch dataset" request (the mainframe equivalent of "delete file" for you YSG's), Iceberg could recover the space freed from deleted datasets, and re-use that space to store other data.
In today's world, no block-oriented virtual provisioning solution can do this. But, some of the NAS-based solutions can. And this is for a similar reason - today's NAS file sharing protocols (including NFS and CIFS) actually include a verb for "delete file" and others that allow for sparse physical allocation (an inherent cornerstone of NetApp's WAFL). But since the SCSI command set is so low-level, there is (currently) no way for a file system to tell the block storage device (or array) that a block is no longer in use.
wide - or shallow?
But perhaps a more significant challenge for virtual provisioning is the one I pointed out in my prior blog - using fewer spindles to support the same amount of actual data poses a significant performance risk. Put simply, if your actual storage capacity utilization (in GB's) before implementing virtual provisioning is less than your storage performance utilization (in either IOP/s or MB/s), then moving to virtual provisioning (on fewer spindles) will almost definitely have a measurable impact on performance.
Hey, maybe I'll make that the anarchist's first law of virtual provisioning!
It's simple physics. A mechanical spinning-rust disk drive can do only so many IOPs, and no more. If you use less of them to support the same workload, each drive will have to work harder. And if you use too few to meet the total I/O demand of all the virtual devices that are sharing the spindles, performance goes to heck. Given the nature of with virtual provisioning, the performance impact isn't limited to just one LUN or CKD volume - no, once you exceed the limits of one or more drives, every "virtual" device in the pool that has data on those drives will suffer.
Now, the Hitachi Data Systems marketing machine in Santa Clara would have you believe that "wide striping" solves this problem (it doesn't). And reading the HDS corporate bloggers, you might be led to believe that they actually invented the notion of "wide striping" (they didn't).
The idea behind striping data across multiple spindles is far from new - in fact, since the performance capabilities of mechanical disk drives has been nearly constant while capacities have been doubling every 18-24 months, striping has been the only practical way to attain the performance requirements of I/O intensive applications for decades. Volume striping has been implemented in most host-based volume managers since the 1980's, and EMC implemented array-based striped Metas within Symmetrix back in 1997. Today's Symmetrix DMX-3 can support stripes across more than 1800 drives.
But can wide striping solve the spindle-limited performance challenges of virtual provisioning, as Hu asserts?
I say no. At least not totally - higher capacity utilization will frequently correlate with higher access density.
Let's do the math. Take 8 separate "fat" LUNs, each running on a 7+1 RAID 5 group, each utilizing only 30% of the capacity of the 7 (effective) data spindles. For the math, we'll say these are 300GB drives, so each LUN is using only 30% of 7*300GB, or 630GB, and the 8 LUNs require 5.04TB of real storage. Let's also imagine each LUN is using 50% of the IOPS of the 8 total spindles; let's say each spindle can do 100 IOPS max - so each LUN is running at 400 IOPS, for a total of 3200 IOPs required.
Now, take these same 8 LUNs, and spread them across a virtual provisioning pool large enough to get to 80% utilization, or 24 300GB disk drives configured as 7+1 RAID 5. That gives us 6.3TB of usable data, but only 2400 total IOPs. To support the 3200 IOPS required by the applications, we'll need to add another 7+1 RAID group. But the additional space will effectively be unused and wasted - utilization will drop from 80% to 60%. While this is better than the original 30% utilization, using only half the number of drives, there's no headroom for any growth in IOPS demand. And in fact, if the I/O patterns randomly converge onto only a few of the drives (and chaos theory asserts that it in fact will), performance of one or more of the applications will suffer. But adding drives to meet the IOPS requirements also reduces utilization.
And if the IOPS demands of the old "fat" allocation are higher than 50%, or the actual utilization is lower than 30% (or worse, if BOTH are true), the problem just gets worse. And, on the other hand, if performance utilization is low or if attaining 60% utilization is sufficient ROI, well, then all this probably won't matter. And in fact, we all know that there are lots of LUNs out there that are vastly over-provisioned and under-utilized.
Problem is, do we know which ones they are BEFORE they are deployed?
reality check
About right here I'll be called for spreading FUD (Fear, Uncertainty and Doubt).
And you know what? I won't deny it. In fact I am purposely (and perhaps proudly) trying to raise awareness of the realities of virtual provisioning. Realities that the vendors, industry analysts and press seem to ignore or gloss over when they discuss and report on the feature.
But I'm not aiming at a particular vendor, implementation or analyst - I'm challenging the industry as a whole. We (collectively) aren't helping our customers and prospects to understand the pros and cons of virtual provisioning. Worse, by dumbing-down the discussions to the benefits only, the peripheral audiences of users, investors and (perhaps most importantly) those in control of IT budgets are being misled to believe that virtual provisioning can cut their storage requirements (and costs) by more than half.
I'm not kidding - I know of customers whose management have already mandated broad deployment of virtual provisioning with a target of attaining more than 85-90% storage capacity utilization. And while I can't disagree with the motivation (cut costs), these mandates apparently include no considerations for the performance implications or the risks associated with unexpectedly running out of physical storage, or even an assurance of rapid approval for additional physical storage. All are seemingly based on an apparent lack of understanding of the implications and risks - issues I'm sure that the storage administrators understand all too well, but are probably unable to escalate.
And these mandates will inevitably lead customers into the catch-22 trap, where "bad things will happen," to quote Hu and Claus from the recent USP-V announcement.
As I stated earlier, I don't think these challenges are necessarily the fault of the implementations. I do believe that NAS-based virtual provisioning can handle some things better, especially the reclamation and reuse of space from deleted files. But none of the implementations brought to market so far have delivered any magic to get around the physical limitations of mechanical rotating media storage.
And I'm not claiming that EMC has any secret sauce answer to this either, nor do I mean to imply that we/they don't (as a blogger who works for EMC, and not an EMC blogger, I can't discuss such stuff in this forum - at least not without risking my job). But I do intend to expand on this subject in another future blog entry. I'll try to explore such potential challenges as
- calculating the performance requirements and characteristics in advance of a move to virtual provisioning
- actually effecting the move / migration
- setting and monitoring appropriate utilization targets and thresholds
- handling alerts and alarms
- preparing for the inevitable "bad things" that can happen
I also encourage my readers and fellow bloggers to share their own perspectives and best practices recommendations for virtual provisioning. I think we can all safely say this technology will soon become a mainstay in virtually everyone's storage portfolio. Collectively we need to highlight the realities, if only to challenge our vendors to address the issues that will otherwise limit the practical deployment and help avoid the catch-22's.
Oh, and also to help ensure that management is making well-informed decisions and mandates.
Drop me a comment and let me know what you think.
(And my apologies to Chris and Marc for deleting your prior comments - I've incorporated some of the discussion in this re-write, and I'll happily include any updated responses you may have).

http://storagefoo.blogspot.com/2007/06/snapdrive-for-windows-50-thin.html
So SnapDrive does something interesting, even if you have to run it yourself or schedule it to run.
Do you think OS vendors are responsible for trying to make storage technology as dumb as they possibly can? It's like we have to do all this work in spite of them, not with them.
Posted by: Storagezilla | June 07, 2007 at 08:34 PM
I'm not sure that the OS vendors (or file system vendors, for that matter) have done anything malicious. It's just been easier to treat storage as "dumb" in a world of Open Standard (english translation: Lowest Common Denominator Technology).
Intersting solution, just hope it doesn't break with the next version of NTFS. Microsoft twists a bit or two here and there, and viola! Useless drive in a Snap!!!
Better answer will be to get the vendors to integrate support for thin provisioning in their file systems and database engines. Like THAT'S gonna happen overnight...
Posted by: tsa | June 07, 2007 at 08:43 PM
Nigel over at RupturedMonkey adds some additional perspective to this discussion: http://blogs.rupturedmonkey.com/?p=107
Be sure to read the comments!
Posted by: the storage anarchist | July 13, 2007 at 07:00 AM
Your take on this is exactly right... for technologies that still bind storage allocation to physical devices. Despite their claims of "virtualization", EMC and HDS still have you worrying about stripe sets, physical spindles, hypers and metas, LUSEs and LDEVs. In an architecture where the storage pool is truly virtualized this is much less problematic. Take an array like the 3PAR InServs, where volumes are comprised of 256MB blocks allocated in a round-robin fashion from a maximum stripe width of over 150 spindles per logical device, with multiple LDs used to create a RAID volume. Even thinly provisioned, new capacity is allocated by creating new LD sets, and the I/O load is spread very evenly across all spindles.
I think the FUD arises when you take 15-year-old architectures, originally derived from designs optimized for the mainframe world, and "enhance" them for Open Systems implementations. The Symms and Lightnings (even with a Tagma/USP in front of them) really need to be consigned to the dinosaur pens and new product introduced that's engineered from the ground up to offer true virtual pools where allocation isn't spindle-based.
Posted by: RobAtSGH | August 17, 2007 at 09:10 AM
Well, I'll grant you that it is easier to allocate storage using a "newer" architecture system like 3par.
But I wouldn't necessarily rush to put the "more mature" products into the pens just yet. Given their relatively immense installed base of customers, their movements may have to be more methodical and calculated than a spry young startup. But that doesn't mean that "older architectures" can't adapt and change.
Fact is, somebody has to do all the physical storage gyrations no matter what. With 3par it gets done at the factory; on today's USP and DMX, it gets done at the customer site. But with thin provisioning on the enterprise arrays, not only is the physical task going to be significantly simplified, it could also just as easily be done at the factory. There's nothing magical about 3par or thin provisioning that prevents either USP or DMX from providing equal or better capabilities.
So you have to ask yourself: can the dinosaurs adapt to meet the demand for thin provisioning faster than the startups will be able build a significant enough installed base?
I think so, but then - that's just me.
But the fact that 3par has announced its IPO intent within months of both EMC and Hitachi announcing plans to deliver Thin Provisioning may well indicate that 3par realizes that they need to monetize before their uniqueness is minimized.
That thumping sound you hear just might be the dinosaurs stomping their way into new territory :*)
Posted by: the storage anarchist | August 17, 2007 at 01:02 PM