1.056: inside the virtual matrix architecture
The cornerstone of today’s Overtake the future launch is of course the new EMC Virtual Matrix Architecture, the foundation upon which the virtual data center will scale and thrive henceforth.
Combining the market-proven functionality that has made Symmetrix the World’s Most Trusted storage platform with the latest in industry standard compute and I/O technologies, the Virtual Matrix Architecture liberates the power of Symmetrix from the physical barriers of backplane-based monolithic storage arrays and redefines ease-of-use for storage in today’s increasingly virtualized data centers.
But while this new architecture is inarguably revolutionary in the world of storage, the Virtual Matrix is in fact borne of a Darwin-esque evolution of the same Symmetrix architecture that launched the external storage market over 18 years ago. The result is the first storage architecture that integrates the performance and efficiency of traditional scale-up architectures with the cost-effective flexibility of scale-out, blurring the distinction between modular vs. monolithic while redefining the scope of scalable enterprise storage.
In this post I will explain the path that has led EMC to The Virtual Matrix, and along the way I’ll highlight several of the key features of this revolutionary new architecture.
global memory – the foundation of symmetrix, both old and new
The single most distinguishing feature of Symmetrix for the past 18+ years has been its global memory. In every Symmetrix, memory is a central shared resource that is accessible by every single processor and I/O stream in the system.
Over the years, the interconnect between memory and the I/O processors and the way that these processors communicate with each other both have changed, but the operational utility of global memory hasn’t. Write requests received by front-end communications ports are stored in global memory for the back-end disk directors to deliver to disk, and host read requests are fulfilled by the disk directors by placing payloads in global memory for the front-end directors to deliver back to the requestor.
Early Symmetrix systems (before the DMX-era) used a communications bus to transport data and inter-processor communications between processors and memory. The processing complexes themselves were separate and purpose built – front end SCSI, ESCON and later Fibre Channel based directors connected to hosts, while first SCSI and later FC disk directors connected to the disks. Each of these directors presented different emulations, and over the years, the directors evolved to be more like blade servers, with each “blade” supporting 2-4 independent CPU “slices”, each able to transfer data to and from central memory over the backplane.
In 2003, EMC introduced the first significant architectural change to Symmetrix since its birth – the Direct Matrix Architecture. Although still maintaining the front <—> memory <—> back implementation, the bus interconnect of the prior generations was replaced with the Direct Matrix – dedicated I/O transports between each front- and back-end director to the global memory directors. These dedicated paths eliminated the contention bottlenecks of the previous bus architecture, and they allow the Symmetrix DMX series to deliver levels of performance and capacity scalability that has still not been equaled by any other high-end storage array. Today, no other high-end array supports as many disk drives, and it has been only recently that one competitor matched the amount of global memory supported by the Symmetrix DMX-4.
As you should expect, the Virtual Matrix Architecture is also structured around the central resource of global memory – but the implementation is radically different from prior generations. But, before I explain how it differs, let’s discuss some of the reasons for changing things in the first place.
beyond the backplane
Over the past decade, Symmetrix has been portrayed by competitors as “monolithic”, especially those who sought to differentiate from the fixed-frame disk complexes that Symmetrix employed up until the introduction of the DMX3 in 2005. Pre-DMX3 all Symmetrix came in limited sizes – the 5th-generation Symmetrix 8000 series was available in a 96-drive and a 384-drive package, for example. DMX1 & DMX2 were slightly better in that there were actually 4 different sizes (DMX-850, DMX1000, DMX2000 and DMX-3000), but you still couldn’t grow from one size to the next as your needs changed.
The DMX3 and DMX4 changed all that. Customers can start with the 2-disk director DMX4-1500 and grow all the way up to the full-blown 4-disk director DMX4500, supporting up to 2400 drives. Back-end performance scaled up as you added disk directors, and the customers’ investment was protected as needs changed.
But in talking with customers even now, one would quickly come to realize that incremental growth alone wasn’t enough – customers also needed more flexibility in how they deployed their storage across the data center.
In addition, many of EMC’s largest customers had need for even larger configurations than DMX4 could offer – in fact, many of them have multiple independent DMX4’s to meet their performance SLA’s where they would prefer to have only one.
Unfortunately, the very real truth is that the laws of physics get in the way of stretching signals over a physical backplane, and the DMX4 is already at the practical limits of today’s technology. In order to scale Symmetrix even larger, it had to get beyond the limitations of a passive backplane architecture.
So the Virtual Matrix was borne out of two key customer requirements: to support ever-larger capacity storage arrays while removing the requirement that all the bays of the array must be physically adjacent.
And though switching to glass interconnects to the storage bays might solve the latter, this is an expensive alternative that unfortunately does nothing to help increase the scale limits of the backplane.
And there was one more significant motivator behind the Virtual Matrix:
those costly dis-integrated directors
In the early days of Symmetrix, it was generally necessary for every computer vendor to build their own processor complexes. Though Intel was dominating the desktop/laptop market, these were the (end of) the glory days of DEC, Prime, Wang, Data General, HP, Sun, Honeywell, Bull, Apollo, Stratus (et al), and most every one of them had their own proprietary processors. Those that didn’t used “standard” CPUs and surrounded them with custom logic designs.
Symmetrix was just a big I/O server, and so it naturally followed suit, and Symmetrix arrays through today's DMX-4 were built with a healthy helping of custom hardware – most of it centered around accelerating data movement through the system and in performing continuous error checking to ensure data integrity. This logic was purpose built for the task each director would support, and the design worked well: you could mix and match front-end directors to get the connectivity you needed, you could add memory directors to improve cache hit rates, and you could scale the back-end to deliver more cache-miss IOPS if that was what you needed.
But the downside of this purpose-built approach was that you could only put so many director blades on the backplane. No, the bigger challenge was that the design required unique and custom hardware for every generation. Every director had to be built up from the chips, every interconnect invented and implemented from the ground up. And over recent years the gap between “custom” and “industry standard” closed, leaving little leverage in custom hardware.
Back before even the first DMX came to market (the DMX1000 and DMX2000 were launched in February 2003), Symmetrix engineers were drafting proposals to integrate the functions of the front, back and memory controllers onto a single controller form factor. That idea evolved into the notion of switching processors from the now-nichey Power 4 CPU from IBM/Motorola, with largely custom infrastructure, to the far more widely used Intel x86/IA64 processors and their lower-cost industry standard infrastructures. And by virtue of changing processor complexes, the new “unified director” could more readily integrate standard I/O cards, memory DIMMs and interconnects so as to become more cost-efficient and agile as new processors and components were introduced. The engineers just needed to figure out how to connect multiple of these new unified directors together.
Another benefit of putting the memory local to the CPU within the unified directors meant data transfers where no longer limited to the speed of the Bus or the Direct Matrix – I/O could be received and sent with far lower latency and overhead than in prior Symmetrix generations. But embedding the memory in the unified director posed a new challenge – the memory was “local” to the processors, but Symmetrix and the Enginuity storage OS were based on the tried-and-true foundation of “global” memory.
enter the virtual matrix
Summarizing the requirements, the next architectural evolution of Symmetrix had some pretty tough targets:
- memory must be globally accessible
- get the maximum performance benefits of local memory access
- maintain/extend the incremental scalability of the DMX3/DMX4 (add directors to scale)
- leverage a unified director built on industry-standard components for simplicity and flexibility,
- accommodate future scale to ever larger system images
- allow for distributing cabinets around the data center
- minimize the impact on Enginuity software
- deliver all this without compromising reliability or availability
Well the solution in fact did turn out to be rather straight-forward.
In essence, the Virtual Matrix Architecture “virtualizes” processor access to memory. That is, the code treats access to remote memory exactly the same as it does to local memory. The trick is that a small layer of software, assisted by EMC custom hardware logic (in an ASIC on the Virtual Matrix Interface), presents any location in memory as “local” to the processor complexes. If the target memory locations are indeed local, then access to memory is direct, at memory bus speeds. And if not, the request is packetized and parallelized to be sent over the Virtual Matrix Interconnect fabric to the Virtual Matrix Interface on the director that owns the specified memory target.
For the curious, the first generation of the Symmetrix V-Max uses two active-active, non-blocking, serial RapidIO v1.3-compliant private networks as the inter-node Virtual Matrix Interconnect, which supports up to 2.5GB/s full-duplex data transfer per connection – each “director” has 2, and thus each “engine” has 4 connections in the first-gen V-Max.
Why RapidIO you ask? Primarily because of its non-blocking, low latency, high bandwidth, parallelism and cost efficiency – RapidIO has been used in a broad range of embedded applications from MRI systems to military fighter jets. You can learn more about RapidIO at www.RapidIO.org.
a matrix designed for the future
In order to keep up with the exponential growth projections of both storage and virtualized server complexes, the Virtual Matrix Architecture is designed to scale well beyond the limits of the first-generation Symmetrix V-Max. In fact, the gen 1 Virtual Matrix Interconnect could easily address well beyond 256 nodes. But the Virtual Matrix Architecture doesn’t limit the fabric to being 2 RapidIOs; it could be 4 or 8 RapidIO networks running in parallel, or it could be built on a different infrastructure altogether – InfiniBand, FCoE/DCE – or whatever comes along in the coming years.
More importantly, on top of the two dimensions of memory access that the Virtual Matrix Interface implements today (direct to local memory and over the fabric to memory in peer nodes), the Architecture allows for a third dimension of interconnect – a connection between different V-Max systems. This interconnect would not necessarily expand to share memory across all the nodes in two (or more) separate V-Max arrays, but it would allow multiple V-Max arrays to perform high-speed data transfers and even redirected I/O requests between different Symmetrix V-Max “virtual partitions.” This capability of the Architecture will be leveraged in the future to “federate” different generations of V-Max arrays in order to scale to even greater capacities and performance, and will also be used to simplify technology refreshes. In the future, you’ll be able to “federate” a new V-Max with the one on your floor and non-disruptively relocate workloads, data and host I/O ports.
It ain’t magic…but it’s close.