« 3.022: powerful, trusted and smart...meet dumb and dumber | Main | 4.002: Does page size matter -- a rebuttal »

April 29, 2011

4.001: when you say tiering, do you mean degradation?

(Wow, has it really been 4 years since I started blogging?)

Hu Yoshida posted yesterday a perspective on the evolving meaning of the word "Tiering," presumptively as a context for making a cost- and performance-benefit argument for Hitachi Dynamic Tiering (HDT), as implemented on the VSP.

After the usual Hitachi riff about external storage and thin provisioning pools, Hu turns to a discussion of "Page level Dynamic Tiering with HDT." Here he highlights that HDT moves data in 42MB pages, allowing for relocation at the sub-device level based on utilization of the page(s).

Hu then makes a not-so-subtle attempt at asserting superiority against competitive implementations (e.g. VMAX FAST VP, I suppose), with this claim:

The VSP was architected to address this additional load with a global pool of quad core Intel processors that is tightly coupled across an internal switch matrix to a global cache and front/back end processors. Storage systems that do not have this extra processing power will suffer some performance degradation when they do sub LUN level tiering. (emphasis mine)

Folks, permit me to inject a dose of reality…if anything suffers degradation when auto-tiering, it is the VSP…

 

dynamic or fast, but not both

As you might imagine, EMC performance engineering has been busy evaluating both FAST VP and competitive automated tiering solutions, in order to give our sales and implementation teams insights necessary to properly position and configure our products. And as I have noted before, at this point in time there appears no competitive offering that even comes CLOSE to what VMAX FAST VP delivers.

To demonstrate the superiority of FAST VP, and to outright contradict Hu's assertion that VSP's architecture magically avoids the degradation related to the relocations effected by automated tiering, I  submit the following comparison. Using identically configured arrays, with identically configured policies and tier capacities (challenging in and of itself, due to the lack of per-application policies with HDT), and identical starting data layouts, the chart maps response times of an OLTP simulated workload as both systems work to optimize the tiering:

They say a picture is worth a thousand words, and I don't think I have anything to add – you can see for yourself.

hdt ain't fast

Compared to VSP with HDT, VMAX and FAST VP excels on multiple dimensions:

  1. VMAX response times are consistently better than VSP under identical workloads and configurations – before, during and after automated tiering is performed
  2. FAST VP reacts as much as 3x faster than does HDT
  3. FAST VP relocates hot data as much as 2x faster than HDT
  4. FAST VP moves as little as 1/6th as much data as HDT for maximum benefit,
  5. FAST VP has significantly less impact on running applications during relocation than HDT
  6. HDT response times more than double during page relocation
  7. FAST VP resulting response times are nearly 2x faster than HDT

Significantly, FAST VP uses less total flash capacity to deliver better performance than HDT, owing in large part to the efficiency of FAST VP's 7.5MB extents vs. VSP/HDT's 42MB pages. Moving smaller chunks allows a small amount of flash to benefit a larger number of applications – or simply reducing the amount of expensive flash that customers must acquire to support automated tiering on VMAX. Either way, VMAX FAST VP saves customers real money vs. VSP HDT.

In addition, not only does FAST VP provide better results faster than does HDT, it also offers more control over the entire process. For those who don't really want to roll up their sleeves and dig into the operations of their storage, FAST VP works exceptionally well with no tuning.

And for those who really want to get involved, FAST VP includes numerous features that are not even offered by VSP or HDT:

  1. Independent policies per application, storage group or individual device
  2. Dynamically configurable monitoring and relocation time windows (HDT has 1, and to take advantage of it, you have to use manual mode and the CLI)
  3. Tunable relocation "aggressiveness" to minimize performance impact during relocations, or to accelerate changes when necessary to improve application performance
  4. Dynamic priority assignment to resolve resource conflicts between applications

One confounding oddity of HDT is its scheduling: no matter whether you use the default 24 hour cycle, or change it to 1, 2, 4 or 8 hours (your only other options), the cycles always start at midnight as per the VSP service processor clock. And within each cycle, relocation always begins with the lowest numbered device, progressing to the next device only after all the relocations have been completed on the prior. Thus, it is possible that the cycles will never optimize all the devices in the array – especially if workloads are very dynamic or if the relocations take longer than the cycle time.

FAST VP, on the other hand, is…well…fast. FAST VP doesn't even have the notion of "cycles" at all – it continually monitors the workloads, and relocates data whenever necessary to adapt to changing demand. With FAST VP, you can change the policy and see the benefits in minutes, while HDT will not respond until the next cycle begins. As shown above, this means that FAST VP will likely be finished optimizing long before HDT even starts.

Most importantly, FAST VP strives to minimize impact on applications as much as possible by default, and supports tuning parameters that can virtually eliminate any impact (at the expense of taking longer to fully complete optimizations).

easy, dynamic or fast?

In the world of enterprise storage, customers today have a choice of leveraging automated tiering using IBM's DS8700 with EasyTier (which I discussed here), Hitachi's VSP with Hitachi Dynamic Tiering, or EMC VMAX with Fully Automated Storage Tiering for Virtual Pools (FAST VP).

In a dynamic world, where application workloads change (or are added) frequently, which would you choose?

 


TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d834c659f269e2014e88271030970d

Listed below are links to weblogs that reference 4.001: when you say tiering, do you mean degradation?:

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Texan_in_MA

Barry,
HDT users can specify a monitoring window only when using their 24 hour cycle. For example, monitor from 9-5 and the movement begins at 5:00.

One of the reasons we have seen less than expected performance from the VSP with HDT is that we are forced to configure our SATA tier with write-verify mode, which causes every write to SATA to be followed by a read to confirm the correct data made it to the drive. I don't know why this is a requirement but it's not ideal for performance.

soikki

Interesting.

When you say "identically configured arrays", would you spare us the details please? Disk types, amount, array cache etc? Is everything done correctly on host; lvm, load balancing...

I find it odd that VSP's response times are so much worse than V-Max here. It casts a small shade of doubt on the graph. You are claiming here that with identical config, VSP's response times are almost double than V-Max.

If this would be the case, I'm quite sure that you would have trumpeted this much earlier and much louder.

BTW, you note here that "
6.HDT response times more than double during page relocation". Did you note that the same happened for FAST as well?

the storage anarchist

Soikki -

You'll have to take my word for it that the test configuration was the same for both platforms - our performance engineering team is stricter than most customers in performing apples-to-apples comparisons.

And this difference in response time is no surprise, nor is it even new - Symmetrix has long held a significant response time advantage in OLTP-type workloads. Our sales teams use this to their advantage every day.

And indeed, the FAST VP response times did double - but they also remained far below the impact seen with HDT, and the impact lasts for a far shorter time.

And if the impact is too much for a VMAX customer, it can be dialed back by reducing the aggressiveness of the relocations...while HDT offers no such tunability.

the storage anarchist

By the way, Soikki - it doesn't help VSP response times when writes to SATA drives must be verified by reading the data back into the array and comparing to the original data, as is the prerequisite to use SATA with HDT.

VMAX uses more more efficient and effective error detection/prevention for all drive types (including SATA).

But even after auto-tiering optimization, where presumably most of the writes have been moved OFF of the SATA drives, the VSP exhibits the response time deficiency vs VMAX that Hitachi arrays have shown ever since the first Lightning.

soikki

Thanks for commenting the comments :)

However, I must disagree still with you not sharing any configuration details. If a vendor shows a graph comparing their competitor but refuses to share details of the comparison, it is meaningless.

Through the years I have seen similar FUD and graphs from EMC comparing other vendors, and constantly the thing missing is configuration info.

No detailed info = meaningless graphs.

I think that if your graphs here would really be the truth, you would make the spec -tests and shout big time (as with the Celerra). Accept the challenge? Or share us the information I requested earlier. We, customers, are very eager to get accurate information and performance measures.

the storage anarchist

Soikki -

Unfortunately, there are no standardized tests that are designed to demonstrate the operational impacts of automated tiering. Most standardized benchmarks run for a limited time to pre-warm the caches, then execute for a relatively short time, and then report elapsed time and IOPS/MBs averages across the execution period.

Due to the limitations of competitive auto tiering products (like Hitachi HDT and IBM EasyTier), a standardized test would have to be designed to run for days, and to have a dynamic working set that is representative of the real world and that is larger than available cache+flash, and the test has to be easily repeatable.

Neither SpecFS nor SPC tests fit these requirements.

How we create such a test is proprietary information, but the effective workload looks like this:

20% 8K read hit
45% 8K read miss
15% 8K random write
10% 64K seq read
10% 64K seq write

I originally presented these results to challenge the baseless assertion made by Hu Yoshida that VSP could handle the added effort of auto-tiering without impacting performance, while other arrays cannot. Readers can accept my assertion that the tests were in fact fairly executed or not, but at least I have tried to back up my contrarian position with data - data that I feel is fair enough to stake my reputation upon.

As always, YMMV.

alvarezjedi

Hi Barry,

Good to see that the HDS v EMC big frame debate still hasn't gone out of style.

I think your data is great but I have to agree with Soikki here - customers just don't trust fun, colorful graphs. They are a lot more savvy and they want to know details because their reputations and jobs depend on real data.

Slides with hesitation to provide details can make for a relationship of distrust. Believe me, the baseless assertions that Hu Yoshida makes also sow the seeds of distrust (much like your configuration detail-less graph).

You are usually a bit more transparent in your posts... C'mon, share your data!, step up and don't hide behind a cute little "YMMV". Take a risk and trust that the storage world will understand and decide for themselves. You've always been a leader, don't become another Hu :(

Post a comment

Comments are moderated, and will not appear on this weblog until the author has approved them.

This weblog only allows comments from registered users. To comment, please Sign In.

anarchy cannot be moderated

about
the storage anarchist


View Barry Burke's profile on LinkedIn Digg Facebook FriendFeed LinkedIn Ning Other... Other... Other... Pandora Technorati Twitter TypePad YouTube

disclaimer

I am unabashedly an employee of EMC, but the opinions expressed here are entirely my own. I am a blogger who works at EMC, not an EMC blogger. This is my blog, and not EMC's. Content published here is not read or approved in advance by EMC and does not necessarily reflect the views and opinions of EMC.

search & follow

search blogs by many emc employees:

search this blog only:

 posts feed
      Subscribe by Email
 
 comments feed
 

 visit the anarchist @home
 
follow me on twitter follow me on twitter

TwitterCounter for @storageanarchy

recommended reads

privacy policy

This blog uses Google Ads to serve relevant ads with posts & comments. Google may use DoubleClick cookies to collect information (not including your name, address, email address, or telephone number) about your visits to this and other websites in order to provide ads about goods and services of interest to you. If you would like more information about this practice and your options for not having this information used by Google, please visit the Google Privacy Center.

All comments and trackbacks are moderated. Courteous comments always welcomed.

Email addresses are requested for validation of comment submitters only, and will not be shared or sold.

Use OpenDNS