0.078: lions and tigers and bears!
Driving in to work today, I heard a news report about the Hollywood Grizzly Bear that killed his trainer yesterday.
When I got to the office, I listened to Joe field questions during EMC's earnings call (19 consecutive quarters of double-digit year-over-year revenue growth). Several of the participating financial analysts inquired about the potential impact that the newly-delivered virtual provisioning for Symmetrix might have on future capacity demands. From the tone of the questions, you could easily imagine a pride of lions circling their prey.
And sure enough, by noon Beth Pariseau had her coverage posted on SearchStorage, under the headline EMC's Tucci: Thin provisioning mandatory but overrated.
Shortly after the earnings call, a colleague forwarded me the link to a Byte and Switch article by Mary Jander entitled Your Storage Arrays May Be Dangerous. In this article, Mary decries that people need to "rethink the environmental impact of storage gear," because the EPA has reported that electricity use for storage is growing faster than the energy used for the data center overall. Not surprisingly, the EPA called for "storage virtualization, data deduplication, storage tiering, and movement of archival data to storage devices that can be powered down when not in use" as strategies for avoiding environmental damage if not disaster.
In fulfilling my promise to get back to blogging about technology, I thought I'd invest today's post to provide a slightly less sensationalist perspective on thin provisioning, storage capacity and energy efficiency.
And all I have to say about the bear is: remember, these are wild animals, and they're driven by instinct and not logic or trust.
Any resemblance between wild animals and industry experts is purely coincidental!
dangerous storage? i think not.
Quoting from Mary's article:
The EPA reported in August 2007 that storage devices accounted for just 5 percent of the total data center electricity used in 2006. But storage gear was estimated to have the highest compound annual growth rate in electricity use from 2000 through 2006 -- 20 percent, compared to the overall CAGR of 14 for electrical use by all data center end-components in the same period.
To imply that storage is the root of all data center energy problems overlooks the fact that storage is possibly the most efficient component in the data center for the job it does. In fact, in today's Digital Universe, where the amount of digital information that is being stored is growing at a compound annual growth rate of almost 60%, an energy CAGR of only 20% is truly remarkable.
In fact, while the amount of power used by storage in the data center has increased by 200% from the year 2000 to the year 2006 (as reported by the EPA), the amount of digital information being stored has increased almost 1600%.
Said another way: it required 1/8th as much electricity to store a kilobyte of data in 2006 as it used to require back in 2000.
The fact is, we are storing more and more data every year, and disk drive technology alone isn't keeping pace with that growth. Thus, it requires more disks and more power to keep up with the growth. And the storage vendors (all of them, not just EMC) should be heralded instead of demonized, because we have been able to accommodate the growth without linearly increasing power (or cost).
Throughout its lifetime, Symmetrix has exemplified, if not outright defined, the industries' commitment to storage efficiency. In the year 2000, the largest Symmetrix housed a maximum of 384 disk drives in a single array; today, the smallest configuration of the DMX-4 scales up to 360 drives, and the larger scales to 2,400 drives. In 2000, you could use any RAID protection you wanted with Symmetrix, so long as you wanted RAID 1 (mirrored) protection. Today, all DMX's support RAID 1 and RAID 5, and the DMX-3 & 4 support RAID-6. And back at the turn of the millennium, you could buy 10K rpm or 15K rpm drives for your Symmetrix; today you can also buy 7200 RPM SATA drives for bulk storage and even power-efficient/ultra-performance Enterprise Flash Drives for your DMX. Then you could only create full copies of your important data volumes with local or remote replication; today Snapshot technology requires incremental capacity for only the data that changes.
Simply put, the DMX of today supports more somewhere between 1 and 2 orders of magnitude more usable capacity with more performance/spindle while requiring less Watts/GB than it (or any of its peers) did back in the Y2K days.
And more importantly, without these enhancements, the EPA report would have been telling a much more bleak story. Imagine, if the power CAGR for storage was the same as the information CAGR...not many enterprises could readily afford to be paying 6-8 times as much for their electricity bills these days.
Sure - there's lots more that can, must and will be done in the storage industry to keep up with information growth. Storage tiering, virtual provisioning, de-duplication, and MAID each will play a role in reducing the cost, footprint and environmental impact of information storage in the years to come. But let's be real - if the auto industry was keeping up with the storage industry, we all really wouldn't care that gasoline prices are going through the roof.
virtual provisioning and capacity
I also think that the concerns over virtual (thin) provisioning's impact on capacity demands are frivolous, if not outright fear-mongering. First off, as evidenced by the prior discussion (which is why I put it first in this post), information storage demands are growing faster than storage capacity alone can accommodate (again, I'll refer you to IDC's 2007 Expanding Digital Universe report for backup to this assertion).
As noted above, EMC has introduced numerous product features and enhancements that help customers reduce the amount of capacity (and the cost of that capacity), while still growing its revenues and installed base. RAID 5 reduced the amount of storage a customer required on a Symmetrix by almost half - with RAID 5 7+1 you can store on 80 disk drives what it would take 140 RAID 1 drives to contain. Yet when RAID 5 on DMX was introduced in 2004, there was no visible impact to Symmetrix revenues or capacity shipped.
Two simple reasons (IMHO):
- RAID 5 wasn't (and still isn't) appropriate for all information - RAID 1 is inherently more available and delivers noticeably better performance. And Symmetrix customers buy Symmetrix because (you guessed it) Availability and Performance are important to them. Plus there's always an adoption curve, especially in risk-averse markets such as those serviced by Symmetrix. Large scale deployments were delayed until customers gained sufficient experience about where they could afford the trade-offs of RAID 5, and where they couldn't.
- Customers information growth rates were (and still are) faster than merely switch to RAID 5 could off-set, even when coupled with larger disk drives. Oh, many employed RAID 5 everywhere they could, as fast as they could, but it only helped keep their heads above water...at best, it bought them a few months delay in acquiring new capacity.
These, combined with the size of the Symmetrix installed base, served to smooth out any imagined impact on capacity demand at the macro level. Just as many had predicted, I might add - for Joe faced similar questions from analysts back then, and his answer was in fact quite similar to today's - it will only delay, but not eliminate, capacity acquisitions.
Thus I think that Virtual Provisioning can indeed be expected to follow a similar adoption pattern, and it will be employed by customers in a similar manner to offset their incessant demands for more storage. In a very real sense, it should help customers "do more with less" for the short term, but (as Joe rightly pointed out), once they improve their utilization everywhere they can, they will once again have to buy more capacity.
And given today's economy, the introduction of Virtual Provisioning might be just the thing EMC's customers need to bridge their capacity / budget gap.
But just as with RAID 5, Virtual Provisioning won't be deployed for everything in the data center - particularly in the markets served by Symmetrix. Databases don't really benefit all that much from thin provisioning, as they like to consume all the storage allocated to them and manage its utilization themselves. Similarly, the mainframe storage market won't be using thin provisioning to improve utilization - in fact, mainframes have historically operated at storage utilization levels literally unheard of in the open systems market. And where ever possible, snapshots are already "thin", so there's not a lot of opportunity there either.
don't worry. be happy.
So while the predictions of a looming storage doomsday may be great fodder for industry press and financial analysts, I'm pretty sure that the practical reality is far less frightening. The energy crisis is important, but storage already has a demonstrable track record at improving efficiency faster than demand growth. And rather than a self-inflicted deadly wound, virtual provisioning is more appropriately considered yet another component of an industry that is adept at incorporating technologies to address customer requirements.
In fact, instead of worrying about whether or not Virtual Provisioning is going to reduce capacity demand for Symmetrix, methinks we should be asking "what about those products that don't even offer thin provisioning, or huge SATA drives, or super-fast Flash drives, or more than 1024 disk drives in a single array?" Why should they be getting a pass on these blatant gaps in their enterprise storage products just because they're spending money on Israeli companies?
You know who I'm talking about.