« 3.002: expert advice on tiering with ibm db2 | Main | 3.004: tell them why - an idea worth spreading »

May 10, 2010

3.003: to boldly go

Space... the Final Frontier. These are the voyages of the starship Enterprise. Her ongoing mission: to explore strange new worlds, to seek out new life forms and new civilizations, to boldly go where no one has gone before.

And so begins our journey

Today, EMC announces the introduction of another category-creating product: EMC VPLEX. Built upon unique market-proven technology and hardened for the rigors of enterprise IT, VPLEX brings forth a revolutionary new platform for building and deploying distributed virtual data centers. And whether we call it Distributed Federation, a component of Virtual Storage, the On Ramp to the Private Cloud, an application Teleporter, a Time Machine for Data, the foundational component that extends the Virtual Mainframe beyond the walls of a single data center, or even simply distributed storage virtualization, VPLEX clearly will change the way we build and deploy applications.

By tearing down the barriers of space and time, VPLEX will allow us to rethink when and where we run applications. Couple with virtual server technology, VPLEX can allow applications to be relocated not only to another server cluster, but one in a totally different location. And with its unique approach to distributed cache management, VPLEX can enable applications to begin running at their new destination before all of the application’s data has been relocated. In fact, the Access Anywhere caching technology can even present data at a remote site without any data storage at the site! This capability of VPLEX Metro portends the future where VPLEX will both extend the distance between VPLEX clusters and expand the number of clusters VPLEX supports. As the VPLEX partner and development communities expand the use cases and integration beyond the hypervisors into database and application integration with the VPLEX distributed cache capabilities, we will see the emergence of new computing models.

We are at the beginning of a new journey.


in the beginning…

I have had the good fortune to be directly involved in several of EMC’s innovative game changers over the past going-on-10 years, from Centera (the pioneer of CAS- Content Addressed Storage), to DMX 1-4 (the transition from monolithic to modular enterprise storage), to VMAX (the first Intel-based enterprise array purpose-built for the virtual data center). But without a doubt, VPLEX has been the most exciting project of them all.

From the very beginning of the process that led to the acquisition of the technology assets from Yotta Yotta (and the subsequent hiring of nearly the entire development and delivery teams), we knew we had a unique opportunity. The technology not only worked, it was being used in some very, very interesting applications (US military, so I can’t really discuss them, even today). True, it was then still the product of a startup, and there were some rough edges. But at the core many of us recognized the core technology as the missing link in our journey to the private cloud – the ability to present block storage devices (LUNs) at multiple sites, with each site seeing the device as fully read and write enabled (active-active, in storage parlance) leveraging concepts like virtual synchrony and utilizing cache to minimize the effects of latency and bandwidth limitations. Repackaged and optimized for the enterprise, we set out to change the world.

It may surprise many, but the acquisition was led and effected by the Symmetrix Product Group, which was then renamed to the Symmetrix and Virtualization Product Group – SVPG. (On a related note, I was promoted last year to the position of Chief Strategy Officer for SVPG, a title I chose not to broadcast at the time lest I suffer the Wrath of Khan Barry Whyte and other bloggers – a wrath I'm sure I have only delayed, not avoided.)

From the outset, the new virtualization development team was indoctrinated into the best practices and near-maniacal focus on quality that is the foundation of Symmetrix engineering. Over the course of 18 or so months, the technology was replatformed from its original custom real-time operating system to EMC’s storage-optimized version of Linux, and from its original custom hardware to an optimized configuration of EMC’s most powerful and highest availability Intel multiprocessor storage engine. It was integrated with ESRS, EMC’s Standard Remote Service gateway, affording customer service the use of the same technology used for Symmetrix and CLARiiON remote support. Documentation was written, best practices defined and documented, beta test deployed at customer sites around the globe, presales and implementation training developed and deployed, partners engaged and references established – all leveraging the same tried and true practices that have underpinned every new Symmetrix product delivery for the past decade to bring you the VPLEX you see today.

big bang!

EMC VPLEX no longer a fledgling product from a small startup. In fact, I think many will be surprised at the breadth and scope of this week’s VPLEX announcement. For perhaps the first time in my career in technology, it seemed that not only every potential partner who saw the product wanted to be included in the launch, but nearly every customer as well! As a result, we have VPLEX launch support not only from the obvious partners in VMware and Cisco and Intel, but also from Brocade and Microsoft and Oracle.

Unlike a certain storage virtualization appliance who spotlighted a sheep farmer in Norway as their first public reference customer (no offense intended to sheep farmers in any country), VPLEX comes to market with two mainstream customer references: AOL and Melbourne IT (who will be replacing their sheep farmer-endorsed product with the more applicable VPLEX).

Add to that a plethora of supporting collateral, like the AOL Case Study and the ESG Lab Report, and you have the makings of something big.

Take all that, and do the public launch at EMC World, EMC’s largest public event of the year, and clearly you see the volume has been amped up. Over the course of the event, there will be an effective crescendo of information on VPLEX presented to EMC’s customers, industry press and analysts and yes, even the blogosphere: you can follow the goings on from the VPLEX home page on EMC.com. There you’ll find videos from Joe Tucci and Pat Gelsinger’s Monday keynotes and Brian Gallagher’s keynote on Tuesday – each will provide a deeper view of the technology and its use cases, building upon the prior speaker’s content. From there Beth Phalen (VP of VPLEX development) and I will provide deeper insights, and several members of the VPLEX development and product management teams will be presenting implementation, configuration and product internals in sessions at EMC World. And visitors to the Solutions Pavilion will see not only demonstrations of VMware , HyperV and OVM application migrations live, they’ll actually be able to participate in a hands-on challenge to deploy VPLEX Metro shared LUNs in the shortest times possible.

This is no science project: it is very real, and it has been shipping to customers since April 15, 2010.

fast, sideways

Distributed cache coherence is nothing new – there are many implementations, although they are usually found in geographically distributed clustered servers (aka geoclusters) and distributed file systems (think CXFS, GPFS, Lustre, etc.). Even Oracle RAC can be considered as an implementation of distributed coherency. However, there has yet to be found a distributed solution that truly allows any block of data to be written by any member of the cluster at any location at any time (I call it the any-any squared problem). This because to protect against the inevitable collision, internode coordination is required and the speed of light gets in the way with significant latency. As a result, there are few commercially available any-any^2 cluster solutions available in the market, Oracle RAC being perhaps the most significant exception, and even it has very strict limitations on maximum inter-node communications.

The innovation of VPLEX is one of those slap-your-forehead kind of inventions – the kind that looks obvious the moment it’s explained to you. Essentially what the architects did was to implement a fully active-active based entirely on cache that technically reproduces the EXACT behavior of connecting multiple hosts to the same LUN – except VPLEX does it over distance. In fact, any-any^2 can write to a VPLEX LUN, but you’ll get garbage out unless you have something coordinating the writes across the hosts.

The innovation? Simple: the team focused on use cases that are adjacent to the any-any^2 problem – use cases where you know there will only ever be 1 writer to a specific range of LBAs, and use cases where you want to geographically relocate the writer for some reason. And you know what? Virtually every hypervisor, including HyperV, OVM and VMware, manage their guested VMs such that every VM has its own distinct LBA range that it “owns” and that nobody else will ever write to.

The practical use cases for VPLEX might be as simple as moving data off of a set of servers and storage to effect a tech refresh or to service the power and cooling in the data center. Or maybe there’s a storm coming and you want to move your applications to your DR site. Or perhaps you need some additional compute power and you decide to temporarily relocate some workloads to your DR site – or perhaps even to a Service Provider – say, perhaps a Cloud Service Provider.

By focusing on these adjacent use cases, the team has optimized not around protecting multiple writers, but on ensuring the fast and efficient relocation of the writer. Within its foundation, GeoSynchrony (the VPLEX operating software) maintains referential locality for any writers to any LBA range, positioning the locks for a given LBA range in the cluster member that is receiving the writes. In this manner, a single LUN becomes a collection of LBA ranges, each defined by where the writes are most frequently occurring; the other cluster nodes need not maintain the locks nor even a copy of the data – they need only to be able to find the owner quickly and efficiently should an I/O arrive somewhere else in the cluster for the same blocks.

And that’s what makes VPLEX tick – the efficient approach of keeping track of who owns what and the ability to relocate the owner should the I/Os move to another VPLEX Metro cluster.

And in a very real way, VPLEX cache is similar to the concepts behind EMC’s Fully Automated Storage Tiering (FAST), only turned on its side. The ultimate objective of VPLEX is to maintain the most frequently accessed data as near to the consumer as possible – even moving that subset of data to the remote site first so that the application can “land” with minimal delay, We’ll discuss this topic more as we get closer to delivering VPLEX Geo and its asynchronous distance support, but for now, you can imagine that VPLEX Access Anywhere is doing what FAST does: promoting the data that benefits most from the performance of Flash and demoting the unused data to less expensive SATA – only VPLEX will promote the application’s working set of data to the destination cache so that the application won’t have to wait to move nor suffer the latencies of long-distance reads.

flight of the wanna-bees

As with any revolutionary innovation, today’s announcement will undoubtedly face challenges from critics, cynics, antagonists and wanna-bees.

Over the past several weeks we have seen concerted efforts by both IBM and HDS bloggers to discredit VPLEX even before it was unveiled. Looking back at those attacks now, you’ll recognize that none of them had a clue what VPLEX is really all about. I have to give Barry Whyte credit for one thing, though: his assertion that VPLEX’s support matrix would never compare with that of his beloved SAN Volume Controller drove the VPLEX team to do the unthinkable – they qualified VPLEX support for SVC as a storage platform over the past several weeks. So, VPLEX’s support matrix is indeed a superset of SVC’s, in that SVC hasn’t qualified SVC as a supported platform. And with VPLEX Metro, even SVC customers can now get truly active/active distributed storage, something many have asked for. And something that IBM has been unable to deliver.

NetApp, on the other hand, has already tried to present MetroCluster as a copy-cat alternative to VPLEX. The differences will be made clear, but to put it simply: splitting a single 2 node cluster into two locations is hardly an alternative to VPLEX Metro, which federates two separate HA clusters, each with up to EIGHT HA nodes, to provide truly scalable, reliable and efficient active-active Access Anywhere.

Almost immediately, I’m sure that the competitors will try to minimize the impact of VPLEX by focusing their attention on the basic storage virtualization capabilities of VPLEX Local. They’ll point out any and every missing feature as they struggle to maintain relevance in the new world that VPLEX Metro enables. And they’ll all overlook an important point: VPLEX Local doesn’t obsolete customer investment in their existing storage by relegating it to the role of RBOD as IBM, Hitachi and NetApp’s virtualization products do.

VPLEX adds value to existing storage.

And in the future, we’ll see VPLEX evolve into an intelligent services layer, both providing Virtual Storage Services (like snaps, clones, thin and FAST) to the application servers it supports and intelligently leveraging the storage services of the storage arrays it is deployed with. This will emerge first and best with EMC arrays, of course, but it is and will be standards-based to ensure the broadest integration possible.

the final frontier?

I hardly think so. But VPLEX does pioneer many new possibilities. As an Application Teleport (credit Chad Sakac with that one), it changes when, where and how we deploy applications, affording new opportunities for resiliency and distributed computing. And the technology will not be limited to what we see today in VPLEX – I can assure you that the features and technology will find there way across EMC’s entire product portfolio. And perhaps more significantly, we are already hearing from developers and service providers about their desires to integrate more closely with the technology. From the core distributed cache to the APIs and protocols leveraged by the product, VPLEX represents perhaps The Next Generation, if not the end of the challenges of latency and bandwidth.

I see a very exciting future for VPLEX, but then, that’s what I get paid to do. This future is made all the more exciting, however, by the massive amount of interest and support we are seeing. This is both a small first step, and a giant leap.

Beam me up, Scotty!

technorati tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,


TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d834c659f269e20133ed63106b970b

Listed below are links to weblogs that reference 3.003: to boldly go:

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

marc farley

Every time EMC trots out the Future Vision Machine, EMC bloggers take a big detour down the alley of "what will our competitors say?"

Hmmmm. How about focusing on the missed Star Trek tie-in in your blog, which is that this is a Storage Federation announcement? The original Star Trek series ran for three years and got the boot. Twenty years later, the Star Trek sequels started up and were much more successful.

I suspect this is what will happen with VPLEX; there will be a Wow from the sci-fi fans out there and then years later
other versions will become much more successful.

Barry Whyte

So its half baked virtualization, some data movement, but wheres the actual value, when you need to still buy another expensive license for backend copy services. You won't be able to use the cache if you are going to use backend copy services, certainly not on non-EMC storage - maybe if you have some more proprietary hooks into EMC storage, but otherwise a "point in time" at the host isn't a point in time at the storage.

Heard rumors that you are looking to use SVC to get storage interop, but without an SCORE/RPQ request, IBM won't support it as a host type without passing out own interop testing and validation.

I was hoping for so much more, just like the marketing hype behind FAST, and we are still waiting for it to appear at the necessary sub-lun level...

Dikrek

This seems highly interesting. I do wonder though why you spell NetApp as "NotApp".

You can give us the simple courtesy of spelling the name right :)

What are the enterprise deployments this was successfully deployed at BTW? I heard some telco back when it was still YottaYotta.

Thx

D

the storage anarchist

Dikrek - apologies for the typo, it has been corrected.

The two reference sites I provided are indeed enterprise deployments. Visit http://www.emc.com/vplex for more...

the storage anarchist

BarryW -

Actually, we can indeed cache and leverage back end copy services - we have total control over when we push writes out and when we request the back-end to perform a service for us.

And I'm not to worried about IBM supporting SVC behind VPLEX - most customers will just retire their old SVCs anyway. The capability robably gets used more for migrations than a permanent deployment.

Have your fun, but sub-LUN FAST continues on schedule. And with a granularity far smaller than the GB extents in DS8700 - granted, the DS8K is really a mid tier architecture and cannot track the metadata required by finer granularity, so the limitation is appropriate. But hardly gives you bragging rights

Stay tuned into EMC World - there's more to come that you'll be interested in.

The comments to this entry are closed.

anarchy cannot be moderated

about
the storage anarchist


View Barry Burke's profile on LinkedIn Digg Facebook FriendFeed LinkedIn Ning Other... Other... Other... Pandora Technorati Twitter Typepad YouTube

disclaimer

I am unabashedly an employee of EMC, but the opinions expressed here are entirely my own. I am a blogger who works at EMC, not an EMC blogger. This is my blog, and not EMC's. Content published here is not read or approved in advance by EMC and does not necessarily reflect the views and opinions of EMC.

search & follow

search blogs by many emc employees:

search this blog only:

 posts feed
      Subscribe by Email
 
 comments feed
 

 visit the anarchist @home
 
follow me on twitter follow me on twitter

TwitterCounter for @storageanarchy

recommended reads

privacy policy

This blog uses Google Ads to serve relevant ads with posts & comments. Google may use DoubleClick cookies to collect information (not including your name, address, email address, or telephone number) about your visits to this and other websites in order to provide ads about goods and services of interest to you. If you would like more information about this practice and your options for not having this information used by Google, please visit the Google Privacy Center.

All comments and trackbacks are moderated. Courteous comments always welcomed.

Email addresses are requested for validation of comment submitters only, and will not be shared or sold.

Use OpenDNS