4.001: when you say tiering, do you mean degradation?
(Wow, has it really been 4 years since I started blogging?)
Hu Yoshida posted yesterday a perspective on the evolving meaning of the word "Tiering," presumptively as a context for making a cost- and performance-benefit argument for Hitachi Dynamic Tiering (HDT), as implemented on the VSP.
After the usual Hitachi riff about external storage and thin provisioning pools, Hu turns to a discussion of "Page level Dynamic Tiering with HDT." Here he highlights that HDT moves data in 42MB pages, allowing for relocation at the sub-device level based on utilization of the page(s).
Hu then makes a not-so-subtle attempt at asserting superiority against competitive implementations (e.g. VMAX FAST VP, I suppose), with this claim:
The VSP was architected to address this additional load with a global pool of quad core Intel processors that is tightly coupled across an internal switch matrix to a global cache and front/back end processors. Storage systems that do not have this extra processing power will suffer some performance degradation when they do sub LUN level tiering. (emphasis mine)
Folks, permit me to inject a dose of reality…if anything suffers degradation when auto-tiering, it is the VSP…
dynamic or fast, but not both
As you might imagine, EMC performance engineering has been busy evaluating both FAST VP and competitive automated tiering solutions, in order to give our sales and implementation teams insights necessary to properly position and configure our products. And as I have noted before, at this point in time there appears no competitive offering that even comes CLOSE to what VMAX FAST VP delivers.
To demonstrate the superiority of FAST VP, and to outright contradict Hu's assertion that VSP's architecture magically avoids the degradation related to the relocations effected by automated tiering, I submit the following comparison. Using identically configured arrays, with identically configured policies and tier capacities (challenging in and of itself, due to the lack of per-application policies with HDT), and identical starting data layouts, the chart maps response times of an OLTP simulated workload as both systems work to optimize the tiering:
They say a picture is worth a thousand words, and I don't think I have anything to add – you can see for yourself.
hdt ain't fast
Compared to VSP with HDT, VMAX and FAST VP excels on multiple dimensions:
- VMAX response times are consistently better than VSP under identical workloads and configurations – before, during and after automated tiering is performed
- FAST VP reacts as much as 3x faster than does HDT
- FAST VP relocates hot data as much as 2x faster than HDT
- FAST VP moves as little as 1/6th as much data as HDT for maximum benefit,
- FAST VP has significantly less impact on running applications during relocation than HDT
- HDT response times more than double during page relocation
- FAST VP resulting response times are nearly 2x faster than HDT
Significantly, FAST VP uses less total flash capacity to deliver better performance than HDT, owing in large part to the efficiency of FAST VP's 7.5MB extents vs. VSP/HDT's 42MB pages. Moving smaller chunks allows a small amount of flash to benefit a larger number of applications – or simply reducing the amount of expensive flash that customers must acquire to support automated tiering on VMAX. Either way, VMAX FAST VP saves customers real money vs. VSP HDT.
In addition, not only does FAST VP provide better results faster than does HDT, it also offers more control over the entire process. For those who don't really want to roll up their sleeves and dig into the operations of their storage, FAST VP works exceptionally well with no tuning.
And for those who really want to get involved, FAST VP includes numerous features that are not even offered by VSP or HDT:
- Independent policies per application, storage group or individual device
- Dynamically configurable monitoring and relocation time windows (HDT has 1, and to take advantage of it, you have to use manual mode and the CLI)
- Tunable relocation "aggressiveness" to minimize performance impact during relocations, or to accelerate changes when necessary to improve application performance
- Dynamic priority assignment to resolve resource conflicts between applications
One confounding oddity of HDT is its scheduling: no matter whether you use the default 24 hour cycle, or change it to 1, 2, 4 or 8 hours (your only other options), the cycles always start at midnight as per the VSP service processor clock. And within each cycle, relocation always begins with the lowest numbered device, progressing to the next device only after all the relocations have been completed on the prior. Thus, it is possible that the cycles will never optimize all the devices in the array – especially if workloads are very dynamic or if the relocations take longer than the cycle time.
FAST VP, on the other hand, is…well…fast. FAST VP doesn't even have the notion of "cycles" at all – it continually monitors the workloads, and relocates data whenever necessary to adapt to changing demand. With FAST VP, you can change the policy and see the benefits in minutes, while HDT will not respond until the next cycle begins. As shown above, this means that FAST VP will likely be finished optimizing long before HDT even starts.
Most importantly, FAST VP strives to minimize impact on applications as much as possible by default, and supports tuning parameters that can virtually eliminate any impact (at the expense of taking longer to fully complete optimizations).
easy, dynamic or fast?
In the world of enterprise storage, customers today have a choice of leveraging automated tiering using IBM's DS8700 with EasyTier (which I discussed here), Hitachi's VSP with Hitachi Dynamic Tiering, or EMC VMAX with Fully Automated Storage Tiering for Virtual Pools (FAST VP).
In a dynamic world, where application workloads change (or are added) frequently, which would you choose?
technorati tags:
Barry,
HDT users can specify a monitoring window only when using their 24 hour cycle. For example, monitor from 9-5 and the movement begins at 5:00.
One of the reasons we have seen less than expected performance from the VSP with HDT is that we are forced to configure our SATA tier with write-verify mode, which causes every write to SATA to be followed by a read to confirm the correct data made it to the drive. I don't know why this is a requirement but it's not ideal for performance.
Posted by: Texan_in_MA | April 29, 2011 at 07:28 PM
Interesting.
When you say "identically configured arrays", would you spare us the details please? Disk types, amount, array cache etc? Is everything done correctly on host; lvm, load balancing...
I find it odd that VSP's response times are so much worse than V-Max here. It casts a small shade of doubt on the graph. You are claiming here that with identical config, VSP's response times are almost double than V-Max.
If this would be the case, I'm quite sure that you would have trumpeted this much earlier and much louder.
BTW, you note here that "
6.HDT response times more than double during page relocation". Did you note that the same happened for FAST as well?
Posted by: soikki | May 02, 2011 at 03:42 PM
Soikki -
You'll have to take my word for it that the test configuration was the same for both platforms - our performance engineering team is stricter than most customers in performing apples-to-apples comparisons.
And this difference in response time is no surprise, nor is it even new - Symmetrix has long held a significant response time advantage in OLTP-type workloads. Our sales teams use this to their advantage every day.
And indeed, the FAST VP response times did double - but they also remained far below the impact seen with HDT, and the impact lasts for a far shorter time.
And if the impact is too much for a VMAX customer, it can be dialed back by reducing the aggressiveness of the relocations...while HDT offers no such tunability.
Posted by: the storage anarchist | May 02, 2011 at 04:03 PM
By the way, Soikki - it doesn't help VSP response times when writes to SATA drives must be verified by reading the data back into the array and comparing to the original data, as is the prerequisite to use SATA with HDT.
VMAX uses more more efficient and effective error detection/prevention for all drive types (including SATA).
But even after auto-tiering optimization, where presumably most of the writes have been moved OFF of the SATA drives, the VSP exhibits the response time deficiency vs VMAX that Hitachi arrays have shown ever since the first Lightning.
Posted by: the storage anarchist | May 03, 2011 at 03:13 PM
Thanks for commenting the comments :)
However, I must disagree still with you not sharing any configuration details. If a vendor shows a graph comparing their competitor but refuses to share details of the comparison, it is meaningless.
Through the years I have seen similar FUD and graphs from EMC comparing other vendors, and constantly the thing missing is configuration info.
No detailed info = meaningless graphs.
I think that if your graphs here would really be the truth, you would make the spec -tests and shout big time (as with the Celerra). Accept the challenge? Or share us the information I requested earlier. We, customers, are very eager to get accurate information and performance measures.
Posted by: soikki | May 05, 2011 at 03:19 PM
Soikki -
Unfortunately, there are no standardized tests that are designed to demonstrate the operational impacts of automated tiering. Most standardized benchmarks run for a limited time to pre-warm the caches, then execute for a relatively short time, and then report elapsed time and IOPS/MBs averages across the execution period.
Due to the limitations of competitive auto tiering products (like Hitachi HDT and IBM EasyTier), a standardized test would have to be designed to run for days, and to have a dynamic working set that is representative of the real world and that is larger than available cache+flash, and the test has to be easily repeatable.
Neither SpecFS nor SPC tests fit these requirements.
How we create such a test is proprietary information, but the effective workload looks like this:
20% 8K read hit
45% 8K read miss
15% 8K random write
10% 64K seq read
10% 64K seq write
I originally presented these results to challenge the baseless assertion made by Hu Yoshida that VSP could handle the added effort of auto-tiering without impacting performance, while other arrays cannot. Readers can accept my assertion that the tests were in fact fairly executed or not, but at least I have tried to back up my contrarian position with data - data that I feel is fair enough to stake my reputation upon.
As always, YMMV.
Posted by: the storage anarchist | May 05, 2011 at 05:24 PM
Hi Barry,
Good to see that the HDS v EMC big frame debate still hasn't gone out of style.
I think your data is great but I have to agree with Soikki here - customers just don't trust fun, colorful graphs. They are a lot more savvy and they want to know details because their reputations and jobs depend on real data.
Slides with hesitation to provide details can make for a relationship of distrust. Believe me, the baseless assertions that Hu Yoshida makes also sow the seeds of distrust (much like your configuration detail-less graph).
You are usually a bit more transparent in your posts... C'mon, share your data!, step up and don't hide behind a cute little "YMMV". Take a risk and trust that the storage world will understand and decide for themselves. You've always been a leader, don't become another Hu :(
Posted by: alvarezjedi | August 11, 2011 at 02:28 PM