4.010: when lightning strikes
There has been lots of discussion since EMC's announcement of VFCache, much of it about the implications of said announcement on the storage industry. I've seen all sorts of assertions made by analysts, competitors, wanna bees and prognosticators from all backgrounds – some thoughtful, some diversionary and some that are just down right silly.
There are those that say EMC's entry into the server-side Flash market validates the market for the early entrants. While that may be true in some regards, I will point out that when considered within the entire scope of the announcement, VFCache actually offers significant differentiation from would-be competitors. It is yet to be seen if or how the "established" players in server-side Flash market will respond to that differentiation. (More on this after the break).
There were some who turned this argument around – because VFCache was implemented as a "cache", it couldn't compete with the "established" players in this space – this even though VFCache offers the traditional "Flash-as-DAS" for those that want it. So then they said VFCache was too small to be competitive, especially since some of the other players were talking about 10TB devices and such. I found all this humorous – not surprising, just funny. I always get a chuckle when the success of something revolutionary is measured using the yardstick of the "old" way. Like when EMC introduced the first Flash drives for an enterprise storage array back in January 2008. There were a lot of people (and even a certain competitor's CTO) who asserted Flash was too expensive to have any real utility, and that "nobody was asking for it." Today, barely 4 years later it is hard to find any commercial mid-range or enterprise arrays that don't offer SSDs in ne capacity or another (pun intended).
Then there are those that assert this movement to server-side (Flash) storage represents a full circle return from the 20+ year external storage "diversion," portending the impending doom of the disk drive and/or the external storage array altogether. I assert that for either of these to be true requires an unforeseen discontinuity of pricing: solid state has to get a LOT cheaper than any reasonable projection, or hard disk drives have to get a LOT more expensive. Short of that, there remains a niche opportunity for flash-only solutions, but the sheer economics of $/GB will ensure that the vast majority of the storage market will be dominated by spinning rust for a VERY long time – though increasingly complimented by solid-state persistent storage to deliver the performance required by the typically small subset of any dataset that is "hot" at any given time.
And finally there are those that have made claims that server-side Flash is the precursor to entirely new ways of developing applications, fueled by the heretofore unattainable I/O performance levels delivered by affordable server-side large-scale solid state storage. Some of these pundits go on to assert that server-side solid state technology will drive such a revolutionary overhaul of application development that external storage itself will cease to exist. I personally believe these are fool's forecasts, proffered by those who ignore the reality of history. In the high-tech industry, new technologies rarely supplant the old – neither overnight, nor even over-decades. The IT landscape is littered with still-functioning dinosaurs that may well never be recoded or replaced: mainframes, tape, COBOL, SCSI, Ethernet, perl, , etc. Switching and conversion costs are formidable barriers to overcome. In a world where more than 2/3 of the average IT budget is spent just keeping things running, and the other 1/3 is being invested in storing the growing flood of new information in perhaps in a token few NEW applications to leverage it all, there is little opportunity to invest in rewriting anything. If it ain't broke, don't fix it. The more probable reality is that server-side Flash (like ever-cheaper DRAM) will lead to new ways of building file systems, databases and applications – BUT these will not represent an overnight revolution. Instead, this new “new” will follow the same evolutionary path as have the new technologies that have come before.
With that expression of my humble opinion, I'll spend the 2nd half of this post exploring how I see VFCache fitting into this information-centric world we live in…
Transformational vs. Transitory
It is no surprise that the first server-side Flash solutions where solid-state drives. Simple packaging that fits into existing form factors and delivers immediate benefit for everything from boot times to application startup and switching, to accelerated application I/O, Flash drives relatively quickly earned their spot as the preferred choice for almost every modern laptop/netbook/tablet, as well as for most desktops and servers.
So when the first PCI-based flash cards emerged, it was logical that the first use case be to emulate the proven drive interface model – at affords immediate utility with no programming (but perhaps a miniscule bit of scripting) to deliver even lower latencies than the physical hard disk controller I/O path. Applications that really needed something approaching DRAM speeds but at over a capacity of data too large to be affordable (or addressable) in the current processor generations took quickly to these PCI-based solid-state “disk” emulations, often employing a workload script along the lines of:
- FOR N=1 to max_datasets
- rm –rf /mount/flashdrive/*.*
- rcp /remote/big_datasets/datasetN.in /mount/flashdrive/
- ONERROR go to 2.
- exec analyze_big_dataset /mount/flashdrive/datasetN.in >> datasetN.out
- rcp datasetN.out /remote/results/datasetN.out
- NEXT N
Starting back with I began meeting with customers in 2008 after EMC’s ground-breaking introduction of Enterprise Flash Drives, nearly every customer’s use case for server side flash had followed this basic model.
More importantly, nearly every one of them wanted to understand if they could get similar performance from array-based flash as they were getting from the (then SSD-based, and later PCI-based) embedded server flash approach. Surprised as I was at the time that there actually existed applications that could live with the occasional complete restart from scratch on an error, I was as equally relieved to know that indeed there are many more applications that customers want to accelerate that also require features that the Flash-as-DAS (direct access storage) solutions cannot deliver, namely:
- support for datasets (and databases) that are significantly larger than is practical fit into local Flash (either in its entirety or through segmentation);
- concurrent access to dataset and databases that are shared/concurrently utilized by multiple applications and/or servers;
- automatic and immediate protection and multi-server access to updates and additions (writes);
- automatic disaster recovery, frequently with multi-dataset/server/array data consistency protection;
- centralized management, monitoring and optimization as a storage asset rather than as an extension of the independent servers.
While flash deployed within a Symmetrix can meet all of these requirements, especially with the introduction of sub-LUN Fully Automated Storage Tiered (FAST), array-based Flash performance is limited by the latencies of physical SCSI I/O over the SAN. We recognized that a hybrid approach was required to address these customer requirements, one where the majority of read I/O operations were serviced locally over the PCI bus, but where writes are delivered synchronously to the external array. Always seeing the writes as-they-happen, the array can then reliably protect the changes (through RAID plus local and/or remote replication), ensuring that the data has been reliably persisted outside of the server just as the application or database engine expects.
VFCache is born
The decision to utilize server-side Flash as a write-through read cache in front of the external array capacity was huge. Instead of requiring datasets to be loaded into the Flash before applications could begin, applications can begin immediately and VFCache will start warming up with whatever data is being requested and reused most frequently. Not longer limited by the size of the local Flash, VFCache can be used against very large LUNs – in fact, VFCache can accelerate multiple different devices, for multiple different applications!
Depending upon the I/O demands of the application(s), and the size of their working set(s), a 300GB VFCache will generally take less than 45 minutes from cold boot to warm up and reach equilibrium. In order to provide the maximum benefits for the most challenging of workloads – small block random I/O – the current VFCache drivers will generally avoid trying to cache large-block sequential I/Os (64KB and greater than 64KB), under the assumption that these could well be a backup operation that would otherwise flush the cache unnecessarily. But many of us remember the ReadyBoost option introduced by Windows Vista and we can imagine that future VFCache drivers set aside a small amount of the cache to accelerate application load times (which also typically utilize large-block sequential I/O).
But the most important feature of VFCache is that it is a write-though cache. While read hits are serviced with no impact on the array, writes are always forwarded to the storage device for persistence and protection. While this operation will encounter the added latency of traversing the host bus adapter (HBA), the storage area network (SAN) and the array interface, this is a small price to pay when data updates and additions are important. With a copy of the data stored safely on the external RAID-protected device, the risk and impact of a server or flash failure is minimized.
And when the target is an intelligently cached disk array like Symmetrix VMAX with Fast Write capabilities (where writes are acknowledged back to the host as soon as they land in the array’s protected global memory), the total write latency can be less than a fraction of a millisecond in total. Slower than a read hit to the PCI flash, but potentially faster than a write to a local disk drive within the server.
making lightning fast
Some of the early naysayers about VFCache made claims that this write overhead would severely limit the utility of the product in the real world; they seemed to be saying that Flash only had any value if 100% of the I/Os were to the flash device.
But customers had told us exactly the opposite of that – they said that they had lots more applications that required the same data protection they enjoyed from their array-based datasets and databases, and for which the Flash-as-DAS approach was a total non-starter.
What most people didn’t understand back in 2008 was what EMC’s engineers were learning about application working sets and workload skews – that for the nearly all applications, a small portion of the total data will be the target of the vast majority of I/O operations. Frequently referenced as the “80-20 rule”, where 80% of the I/O lands on only 20% of the data, the reality is that it’s more like “95-5” – 95% of the I/O targets only 5% of the data. This target 5% may change over time – sometimes gradually, or sometimes dramatically, dependent upon the nature of the application.
EMC’s FAST VP leverages this knowledge to automate internal tiering across Flash, fast HDD and slow Nearline devices. Now proven with more than a year in production applications, being able to put the right 5% on the Flash tier enables that 95% of I/O operations to be responded to in a fraction of a millisecond. And even if the remaining 5% of I/Os take 10 or even 20 times longer, the average response time for all I/Os is drastically reduced.
VFCache leverages this same knowledge, although a bit differently than does VMAX FAST VP. On the VMAX, FAST tries to predict what will be required and then tries to get that into the Flash tier before it is accessed. The current VFCache driver, on the other hand, is a more traditional LRU-type cache – data that is frequently re-read is kept in the Flash cache, and data that is touched once (or infrequently) is aged out to make room for new data.
This difference in caching strategy is the first integration feature between FAST VP and VFCache. In fact, it may surprise people to learn that FAST VP will in fact keep promoting data to Flash even when VFCache is active on the host(s). Both the Symmetrix caching and FAST VP algorithms are adaptive to the workloads they experience, seeking to reduce “Miss” operations using a variety of strategies. So, while VFCache changes the workload that the array sees to be predominantly Read “Miss” and writes, the FAST VP adjusts to best optimize this workload automatically.
making lightning even faster
Further integrations between VFCache and VMAX FAST VP are planned. Some will be focused on management and reporting – Symmetrix Management Console will recognize and highlight servers with VFCache this summer. There are also plans to collect and report on VFCache stats, like hit percentage and such.
But the real opportunity is for tighter integration between the two. As a host-resident driver, VFCache has the ability to inform the array about what's going on inside the server – what blocks are actually scoring read hits and at what rates, for example. VMAX caching algorithms and FAST VP can use this information to complement the metadata kept within the array about every block, track, extent or extent group. The cache algorithms might accelerate the fall-through rate for small block random read miss for data that VFCache is holding so that the array's global memory is quickly put to use servicing other applications, for example. FAST VP might choose to keep those heavily-reused block on a lower tier within the array, recognizing that any future "miss" request for those blocks from the server is very likely to be a one-off that will be cached by VFCache – the slightly longer "miss" time to read from SATA will be more than recovered by numerous subsequent VFCache "hits." And the integration can work in the other direction as well – the array might inform VFCache that the requested blocks of read data have historically not seen any significant VFCache "hits", and so the blocks should "fall through" on the server side as well…allowing VFCache to store only data that has the highest "hit" probability.
There are numerous additional possibilities to leverage the server side knowledge to help the array optimize better for VFCache, and for the array side to help increase VFCache hit rates while minimizing "miss" response times. EMC has some of its brightest working on optimizing the integration, across all of our storage platform, storage federation and solid-state development teams.
And we all share a common goal:
Ensure that VFCache works best with VMAX, and that VMAX works best with VFCache!
Just as when EMC first brought Enterprise Flash Drives to market, we continue to believe that slid state storage is going to dramatically change the way we store and utilize data.Thanks to innovations like EMC's Fully Automated Storage Tiering and now VFCache, virtually any existing application today can cost-effectively enjoy sub-millisecond I/O latencies. Coupled with VMAX, VFCache-equipped servers have practically unlimited capacity backed by the industry-leading data protection, business continuity and disaster recovery capabilities that make Symmetrix the most trusted storage platform in the world.