« 0.054: viral video marketing | Main | 0.056: the emperor and his new clothes »

January 03, 2008

0.055: obligatory "ibm buys xiv" post

Well, I thought I'd wait a day and let the dust settle on this before I made any comments.

Turns out I saved myself a lot of redundant typing. Chuck Hollis covered much of what I would have said (albeit a bit more elegantly). I share his notion that IBM may be using a Web 2.0 smoke-screen to hide their real intent to use Nextra as either A): a response to DELL+EqualLogic and/or 3PAR; or B): as a replacement for the woefully under-funded (and near-dead) DS8000.

I also think there's a potential C): merge Nextra with SVC to solve SVC's emerging Rube-Goldberg scalability problems and get FlashCopy/Global Mirror compatibility onto a truly scalable platform. I guess that the lack of end-to-end data integrity protection is starting to tarnish the SVC image, with wholesale replacements and exorcisms being held on both sides of the pond (or so I've been told). But that's just me being me, I guess (and perhaps in a manner that's a bit more argumentative than Chuck would have written - I'm sure I'll be hearing from BarryW soon on that one).

Steve Duplessie and Mark Peters over at ESG did an good job of explaining what the Nextra is all about, and lends some credibility that this really might be all about Web 2.0 after all, given IBM's need to find a viable replacement for the now aging and somewhat archaic DR500 (tape is dead, haven't you guys heard yet?). But I don't think you really know what Nextra can really do until you actually hear what the current customers are doing with it, and it seems that all of them have lost their tongues for some odd reason. And for the record, I also think Steve's comments that this is probably at least as much about Moshe as it is about Nextra are right on.

At the very least, on his reputation alone Moshe will probably get IBM an audience with a few of those Wall Street IT shops that have banished Big Blue storage from their data centers because of all the incompatible product churn they've incurred since the days of RAMAC, Iceberg, Sharks and now the dead-end can-you-say-downtime DS8000's.

Not to be outdone, Fellow Blogger Tony Pearson took his own shot at explaining what he thinks is the revolutionary neat new technology in Nextra. Unfortunately, he doesn't have much understanding of the Centera architecture, so he mistakenly thinks is this all new. But heck, even though back before Christmas he was joining forces with TwoEgos in a premature wake for Centera, I'll give him a pass on the fact that Centera's been doing this exact type of blob striping and protection since day one (back at the beginning of 2002).

I'm feeling oddly benevolent to start this New Year for some reason...

been there, done that

In fact, Centera's hardware architecture is quite similar to Nextra, and they've been doing blob-level data protection long enough to have learned something that Moshe vehemently resisted - mirrored storage is Way Too Expensive for today's cost-conscious world, even if you're using ATA drives. (Chuck was right - a lot has changed since Moshe was there and EMC was a one-product company).

To hit the CAS markets' required $/GB price points, a couple of years ago Centera implemented Content Parity Protection, effectively RAID 5 at the blob level. Instead of writing each blob twice like Nextra, you calculate the parity protection of N blobs (each on a different drive) and write a parity blob onto yet another different drive. Small write performance hit, and it does lengthens the rebuild time a little, but this more efficient approach doesn't force customers to carry the expense and overhead of what is effectively (under-utilized) mirrored storage in the Nextra.

There's a lot that Centera has learned out in real-world use for the past 6 years; I'd venture there are at least few things that Nextra still has to learn before it's really the practical new storage platform that IBM intends it to be. Only time will tell.

losing data is never "ok"

Back to Tony, I do have one nagging gripe, though...the implication that it's OK to lose 1 to 2 percent of unstructured data just doesn't sit well with me, even in a Web 2.0 world. Especially if that 1 to 2% happens to include *MY* MRI, the one that the doctor needs to do my surgery in 15 minutes or so. I guess losing data is fine, just as long as it's not yours that gets lost.

But what both Tony and ESG have totally missed is that we're not talking about losing a specific set of objects here. Unlike Centera (which stores objects, like MRIs and photographs and document images), Nextra is a BLOCK-LEVEL storage device. By their very definition, block storage devices have absolutely no clue what the content of each of those 1MB blobs might be (this is in fact why EMC created Centera in the first place!).

In a block storage device, only the host file system or database engine "knows" what's actually stored in there. So in the Nextra case that Tony has descried, if even only 7,500-15,000 of the 750,000 total 1MB blobs stored on a single 750GB drive (that's "only" 1 to 2%) suddenly become inaccessible because the drive that held the backup copy also failed, the impact on a file system could be devastating. That 1MB might be in the middle of a 13MB photograph (rendering the entire photo unusable). Or it might contain dozens of little files, now vanished without a trace. Or worst yet, it could actually contain the file system metadata, which describes the names and locations of all the rest of the files in the file system. Each 1MB lost to a double drive failure could mean the loss of an enormous percentage of the files in a file system.

And in fact, with Nextra, the impact will be across not just one, but more likely several dozens or even hundreds of file systems.

Worse still, the Nextra can't do anything to help recover the lost files. It's not going to be as easy as Tony implies- you're not going to be able to just start up your backup utility and start recovering files. No, the only way to even find out what's missing is be to run an fsck or chkdsk or whatever the equivalent file system integrity checker your file system requires. Depending upon your platform, that could take minutes or hours to run - for each potentially effected file system on the array.

Imagine the call from your storage admin to each and every one of your server and database admins who use the array:

"Hi, Joe here, from Storage Administration. How's it going. Yeah thanks - Happy New Year to you, too.

Hey, listen. Sorry to bother you, but we've just had an unexpected double drive failure on Nextra X01B23 up here in the data center, and we've lost some data. Now, we don't know exactly WHAT we've lost, nor even WHICH LUNs might be impacted, so you're going to have to stop all of your applications and run a file system integrity check on every LUN you have mounted on Nextra X01B23. When you're done, submit your restore requests for any lost or damaged files, and we'll see what we have from last night's backups.

Oh, and Have a Nice Day!"

OK. Maybe that won't ever happen. Or maybe it will. I dunno.

But sitting here, I'm having a tough time imagining ANY application, Web 2.0 or not, that is ever going to tolerate that sort of outage, data corruption and/or potential for total loss of data.

Definitely not in any of the enterprises I'm familiar with, anyway - no matter WHAT the application. Heck, even Google has had to address the lost gMail issue, and I really don't think these web-based Office or Photoshop replacements are going to get away with losing your work-in-progress for very long in the real world. Methinks "Web 2.0" is NOT synonymous with "Acceptable Data Loss."

For the moment, I'll admit to being bewildered, but if you have any examples - please, drop me a comment below.

a word of thanks

All that said (and I'm sure we'll explore it more in comments and future posts), I'd like to close by saying Thanks! to IBM for this acquisition.

Although they probably didn't intend to do so, their announcement has reinforced the lasting superiority of Symmetrix in the marketplace.

See, every time Moshe Yanai gets mentioned in a press release or in industry coverage, it's in the context of admiration and respect for designing and building Symmetrix. Moshe is world renowned as the father the game-changing technology that single-handedly sparked the external storage market. And although he left EMC well before Symmetrix DMX redefined that market and was joined by equally innovative products like Celerra, CLARiiON, Centera, EMC Disk Libraries (and even more still to come), Moshe's name will forever be synonymous with Symmetrix and the industry it created (at the early expense of IBM, I might add).

So maybe it's just me, but I'm really enjoying the fact that virtually everything you read about IBM's acquisition of XIV includes a prominent and respectful reference to one (and only one) high-end storage platform, and it ain't IBM's Tongue

Having IBM advertise your product for you?



TrackBack URL for this entry:

Listed below are links to weblogs that reference 0.055: obligatory "ibm buys xiv" post:


Feed You can follow this conversation by subscribing to the comment feed for this post.


Without furthering the idea that I'm actually the hybrid of a Cylon Basestar and not a real person from everything I've read I'm thinking that a project is underway to retrofit some object level intelligence into that system.

Nextra was never going to beat out the giants in Tier 1 on it's own. No way no how.

There's a thread of currently unreleased product running through some of their press statements. But then we all know what pre-acquisition roadmaps look like after you've forked over the cash.

"Here's the picture on the box. A hell of a lot of assembly required see you in 18 months."


So ... the man who destroyed IBM in storage with a simple 'mirror' concept is now rewarded with $300M for a much similar solution. I think someone at IBM needs a lot of help.


I always wonder about these solutions dubbed "Web 2.0" when they don't seem to be related to any internet companies or Web 2.0 techniques.

Even then, I agree that one copy of the data isn't going to cut it. If we look at a real Web 2.0 implementation--Yahoo's Hadoop (http://wiki.apache.org/lucene-hadoop-data/attachments/HadoopPresentations/attachments/radlab-hadoop.pdf) --they keep at least 3 copies and use it for data mining not direct user transactions. That and it's a cluster file system so it knows what files are lost when a disk goes down.

One last question on XIV--if it is only block level and puts LUN pieces all over the place, doesn't that thrash the network? This think looks like a network scalability nightmare.

The comments to this entry are closed.

anarchy cannot be moderated

the storage anarchist

View Barry Burke's profile on LinkedIn Digg Facebook FriendFeed LinkedIn Ning Other... Other... Other... Pandora Technorati Twitter Typepad YouTube


I am unabashedly an employee of EMC, but the opinions expressed here are entirely my own. I am a blogger who works at EMC, not an EMC blogger. This is my blog, and not EMC's. Content published here is not read or approved in advance by EMC and does not necessarily reflect the views and opinions of EMC.

search & follow

search blogs by many emc employees:

search this blog only:

 posts feed
      Subscribe by Email
 comments feed

 visit the anarchist @home
follow me on twitter follow me on twitter

TwitterCounter for @storageanarchy

recommended reads

privacy policy

This blog uses Google Ads to serve relevant ads with posts & comments. Google may use DoubleClick cookies to collect information (not including your name, address, email address, or telephone number) about your visits to this and other websites in order to provide ads about goods and services of interest to you. If you would like more information about this practice and your options for not having this information used by Google, please visit the Google Privacy Center.

All comments and trackbacks are moderated. Courteous comments always welcomed.

Email addresses are requested for validation of comment submitters only, and will not be shared or sold.

Use OpenDNS