« 0.026: free speech, blogketing and mojo | Main | 0.028: new site tool - search the blogs of emc employees »

August 16, 2007

0.027: inside tiered storage - part 1 (definitions)

There has been a lot of talk of late related to tiered storage, mostly surrounding the applicability of using SATA devices in enterprise-class storage arrays. After posting a few comments and follow-ups on fellow blogger sites, I thought perhaps I might invest in making a more in-depth look at the whole topic from an enterprise IT perspective.

The way I see it, this whole notion of tiered storage is pretty broad, so I've outlined my approach to the topic into a couple of related posts that I plan to deliver over the coming days/weeks. Roughly, I think I'll tackle the discussion like this:

part 1: definitions
In the first installment (this one), we'll explore the definition of "tiered storage" - I say "we" because I'd like to collect your feedback on the subject.
 
part 2: options
Next I'll explore the various approaches to implementing tiered storage, using different companies and their products as examples.
 
part 3: challenges
Then I'll discuss some of the challenges of implementing tiered storage, both related to each individual option and across the entire spectrum (I'll give a little preview: there isn't yet a good solution that solves everything for everyone).
 
part 4: predictions
Finally, I'll take a look at what I think will likely be coming down the pike to help improve the overall situation.

Now, I probably won't get through all this back-to-back, so expect me to intermix this series with other topics over the coming days or weeks.

One important caveat - this series is about Tiered Storage, and NOT Information Lifecycle Management (ILM). As SNIA has defined, ILM is about the entire operational ecostructure (people, process, practice, tools & technology) employed to effectively align the business value of information with the IT infrastructure throughout its' lifecycle. This series of posts will explore perhaps the most important of the infrastructure tools - tiered storage - that can be employed in support of ILM, and the various means this tool can be deployed.

So let's get started...

defining tiered storage

[Aug 16, 2007: edit - DOH! the open systems storage guy has pointed out (in a comment below) that I missed the obvious, simpler approach of defining tiers by SLAs. For completeness, I've taken the liberty to add that definition to this chapter. Shame on me for focusing on the complex definitions only]There are at least a couple of different ways that people tend to classify tiered storage.

The two I've seen the most frequently are to categorize by Application Class (or Usage Profile) or to categorize by Media Characteristics, and the O.S.S.G suggests using SLAs. I'll describe each of these below:

Since everyone defines "tiers" differently, I'll establish the baseline for this discussion to be the following:

application class or usage profile

This is perhaps the most appropriate way to generalize classify the storage requirements for each application class, but it's also a very non-specific approach - one person's Tier 1 is often the same as someone else's Tier 2, for example. In my version of this categorization, I've attempted to expand the definitions a little bit to create some delineation between the Tier 1 usage profile that typically justifies high-end/enterprise class storage from the next Tier that historically has been the demand-driver for mid-tier storage.

But these arguably are not perfect, as you'll see.

Tier 1
"Enterprise class storage" (e.g., Symmetrix, USP/XP and IBM DS8000) - typically large-scale consolidation platforms, supporting one or many mission- or business-critical applications. Often deployed with comprehensive local and remote replication infrastructures for Business Continuity (BC) and/or Disaster Recovery (DR). Larger environments may require multiple arrays to meet capacity and service level requirements, frequently with the requirement for multi-host/multi-application/multi-array consistency management for local & distant replication.
 
Tier 2
"Mid-tier storage" (e.g., CLARiiON, AMS, EVA, DS4000, NetApp, etc.) - predominately found in mid-size enterprises and departmental deployments, usually supporting a single or perhaps a few applications or user communities. Alternately, used to store the less mission-critical application data, such as file shares, document libraries or software development environments. Frequently deployed without significant distance replication infrastructures, leveraging instead more tradition backup and point-in-time copies for DR. Larger deployments often utilize multiple arrays, but each is typically operated and managed as an independent entity.
 
Tier 3
"Active Archive" (i.e., Centera, as the very definition of the CAS category) - typically deployed to support the retention and compliance requirements of unstructured data, such as insurance claims, medical records, employee records and the like where verifiable "write once" immutability is requisite along with on-line accessibility. Larger deployments will leverage multiple devices operating as a single coherent unit, and compliance regulations often require at least one remote replica of all stored objects. Deployments may be through integrated support with specific applications (e.g., email archiving, digital MRI, PACs, etc.), or through more traditional interfaces, such as NFS.
 
Tier 4
"Nearline Backup" (e.g., Backup to Disk, Disk Libraries, VTL, etc.) - the place to store point-in-time copies of important information that could require quick recovery due to data corruptions, accidental deletion or other causes of data loss. Typically stored near the primary copy of the data (hence "nearline"), but increasingly is also replicated off-site for added protection, either electronically or through the more traditional "continually rotating truckloads of tapes" approach. May also entail an interface appliance cluster to emulate tape to the host backup applications, or it may be direct-to-disk backups using either backup/archiving software or simply by making point-in-time copies of the target volumes. Usage is most traditionally heavy on the "write" side, with frequent (but not necessarily continuous) recovery requirements - the advantage of disk as the storage media (as opposed to tape) is the speed of recovery.
 
Tier 5
"Deep Archive" (e.g., Tape, CD or DVD-ROM) - the resting place for information that must be retained for elongated periods, often with no planned or expected use or access (save it and forget it storage). Retention period may actually exceed the expected lifetime of the storage media.

Now I know that this list alone will generate controversy. Like I said, one person's Tier 1 is another's Tier 2. And I know I didn't reflect every application, I'm sure I left somebody's product out, and I probably forgot a tier. But for the moment, I ask your indulgence...my intent is merely to provide a reference point for the notion of "application- or usage-based classification" of tiers. To be contrasted by the following approach:

availability slas

(As the ossg suggests) The SLA approach classifies tiers based on the relative availability importance of the data. Tier 1 is the stuff you can't live without for any significant amount of time, tier 2 is important but you probably won't get fired if you lose access to the data temporarily, and tier 3 is the backups and archives. This approach has a lot of variability in definition, since everything is measured in the perspective of the operating environment, if not the limitations of the storage budget. And as the ossg notes, "Tier 1" can even mean different things for different parts of the same IT domain. Taking a bit of editorial license, I'll capture this model as:

Tier 1
the most important data - the stuff that the business cannot run without, data that ideally is (or has to be) available 24x7x365, even in the event of an outage or disruption
 
Tier 2
data for less mission-critical applications (since everything can't be tier 1) where the availability SLAs are less strict - the stuff you'd worry about recovering AFTER you got tier 1 back on-line
 
Tier 3
backups and archives
 
I think I agree with the ossg - this is probably how most storage managers/planners/architects think about things.
media characteristics

An alternative means of classifying storage is to jump right into the "appropriate characteristics" of the storage itself to define the tiers. This approach unfortunately leaves the translation between "application" and "tier" pretty much undefined, and also doesn't include such qualifying characteristics as replication, immutability or MAID (power requirements). But this approach is inherently more consistent as regards to the specific storage required.

Tier 1
15K RPM, striped & mirrored drives (small & fast)
 
Tier 2
10K RPM, RAID 5 / RAID 6 drives (the workhorse for most external arrays)
 
Tier 3
7200 RPM LC-FC or SATA w/RAID 6 (fat, slow and cheap)
 
Tier 4
Power-reduced ("light") storage (e.g., MAID or De-Duped storage)
 
Tier 5
Off-line storage (tape, CDROM, etc.)

Like I said - this approach is much easier to grok than the application-centric approach. And there are some approximate equivalences between the two approaches.

Lately, there have been discussions of at least two more media-based tiers being added to the discussion.

  • The first I'll call Tier 0: for Solid-state Storage Devices (SSD). These devices today are most often made up of fast DDR SDRAM (like your PC or server memory) closely coupled with hard-disk storage and backup power to handle destage in the event of a power loss. SSD technology is most typically deployed for low-latency, high-IOPs environments where the costs are justified by the response-time requirements of the applications. More recently, the notion of using so-called NAND Flash to make SSDs is coming forward, potentially offering reduced costs and lower power (by eliminating the cost overhead of the backup disk drive). These devices are already the storage foundation of the low-end iPods and iPhones and there are even a few laptops built around Flash SSD's in the 2 1/2" form factor. So you might imagine that it's only a matter of time before someone figures out how to overcome the write retention/wear out issues and creates devices that are ready for the rigors of storage for the data center - but they clearly won't be the ones that the Pack Rat discovered - clearly we'll need to see a lot more IOPS out of these for them to classify as a "super Tier." Still, could happen...just don't know when.
     
  • At the other extreme is the class of storage widely known as "bulk" - a commercial version of the kind of stuff that Google, Yahoo and others home-brewed to deploy in support of their global storage requirements for "free" (advertiser-funded) email, blog sites, and accessible-anywhere storage. I've recently seen this class of storage called Tier 10, a moniker that perhaps depicts the intentional and distinct separation of this class of storage from the more traditional applications of data center storage. Where traditional enterprise storage is all about performance and availability, "bulk" checks in more about low cost and forget-it-when-it-fails (literally). Availability is accomplished by ensuring there is more than one copy of each file; cost is controlled by avoiding RAID altogether and using JBOD, and when something fails, the only thing you want to do is power it down to save the electrons - don't even bother to swap it or rebuild it, just plug more in somewhere else in the fabric. This one probably becomes a realistic commercial offering before Flash SSD, though, if I had to bet.

While these two new tiers haven't really begun to emerge across any real significant subset of enterprise IT, it is probable that they will over the coming few years. (I'll try to make more interesting predictions in part 4 of this series).

which do you prefer?

I've presented two three different angles on the definition of tiered storage. None of them is perfect, and I doubt any of them aligns 100% with how you might define the tiers.

But I'm curious, if you had to choose, which of the two three would be closer to your own definition? Or maybe there is a hybrid of the two three that might be closer to your own definition. Or is there an entirely different way that you would define "tiers" for the environments you work with or think about?

I'd appreciate your participation in another little poll, and if you choose "None of the Above," I'd also appreciate it if you would leave a comment summarizing how you think tiering should be defined. I'll let this run for a while, and I'll probably move into the next chapter before everyone has had the chance to vote, but I think it could be interesting to see what we all think...

[edit - I have reset the poll to include the third approach suggested by the ossg - if you voted earlier, please vote again]


<Direct link to this poll on Vizu>

Thanks for playing the home game! [and thanks to the ossg for the input - I hope I've integrated it accurately!]

 

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/t/trackback/2413350/20858721

Listed below are links to weblogs that reference 0.027: inside tiered storage - part 1 (definitions):

» Defining tiers forstorage from Ask the Open Systems Storage guy
Theres a good series going on over at the Storage Anarchists page about defining storage tiers- if youre trying to get some insight to better organize your own data, it promises to be a good series. Heres the link to the fir... [Read More]

Comments

Your categorization is typical of someone who sees a lot of product offerings but doesn't have their job depending on a single company's data. Most IT people who need to tier their storage will adopt tier definitions specific to their situation. A tier system is supposed to help companies decide which of their storage resources should be dedicated to which of their storage needs, and absolutely every company has data that needs to be on "tier one" storage. Defining tier one as an enterprise consolidation array or fast and small drive arrays will not help companies who don't use those technologies.

You're pretty close with the application class tiers, but instead of suggesting hardware and applications, I would break the tiers down by SLA. Since every company you talk to is likely to define their own tiers and use a different type of media for each, it's probably better to just say "tier one contains data whose availability is most important to you". Some consider 10k drives in a shark fine for their tier one business critical mainframe workload, but require a large, higher-performance array of small 15k drives for their VMWare environment. I know of an IT guy at a non-profit with a serious IT budget who refuses to take his most cherished tier of "lose this and your fired with a bad reference" data off of direct attached storage to his Unix cluster. He has an expensive enterprise SAN that would do the job and run circles around the DASD, but the Unix system is a closed box with quadruple redundancy built in on every level that is never touched except for hardware replacements.

One common way of breaking down categories of storage is where the first tier is production data for critical apps, second is production data for low SLA apps as well as nearline backups, and third is permanent backups and archives.

For example, tier one could be a disk controller, tier two could be cheaper, larger drives either in the same system or under another controller, and tier three could be tape/optical. Some may differentiate between low SLA requirement apps and nearline backups (creating another tier for file and print servers), and still others may create a tier for their proprietary systems' storage (like mainframe or AS/400), but if you use SLAs to define the tiers, every company can categorize their storage, and every company has tier one data.

Hi Barry,
Here's another twist to the SLA way of looking at storage. Service Levels have to be associated with Service Areas.

As an example, Exchange may be pretty happy running as a messaging platform on Tier 2 storage from a performance and availability perspective, but it's archival needs may be Tier 1, the highest level (think SOX), say on a Centera. The same Centera may be Tier 3 for a file server archive or marketing data.

So therein lies the realization that there need to be different tiers for different service areas - and I am sure you have seen this before.

So an application needs Primary Storage Tiers (characterized by performance, cost, availability)as you have listed. But it also has Archival Storage needs (characterized by retention period,retrieval speed, immutability etc) , which could range from Centera to ATA drives to tape, which would be archival storage tiers.

And so on for Disaster Recovery storage (enterprise class storage with remote replication, BCVs or equivalents, etc.) to tape based recovery, and other service areas like operational recovery, security etc.

So really, Tiered storage should be a grid, with tiers for each service area (Primary, Archival, DR,...)with the notion that just because something is Tier 1 for one of the areas does not mean it needs Tier 1 on the others. Mapping the SLAs for each service area will yield different Tiers for each.

Make sense?
Cheers, Kartik.

Great topic and approach. My experience with customers is best practice is all about automation so keep definitions simple. If your definitional framework can't be automated...throw it out! Because products, tools and processes still are lacking, the simpler the definition the better-- and it's all relative. Define T1 based on SLA's (and communicate to the business so IT decides where stuff goes based on clear guidelines; not the business based on politics), default everything else to T2 and archive to T3 based on retention, records management and information liability policies. Can't wait to see "part deux."

T0 using NAND may not be as far off as you think. There is at least one company I know of (STEC) selling FC attach SSD's that claim 50K read, 20K write ops and a 10 year guarantee. Prices are still VERY high, but I suspect in the next year or so with Samsung catching up the price will drop. I've seen some SSD papers that show by 2011 SSD and DDM markets will be almost equivalent - this did of course come from a flash vendor!

http://www.stec-inc.com/product/zeusiops.php

I tend to use a mixed approach even though it may be a bit confusing. It really starts from the environment point of view. If the discussion is about CAD/Engineering environment where High-Availability (RTO=0, RPO<=1H) is required then NetApp/FC is Tier-1 and NetApp/SATA is Tier-2, but moving to a Production environment where DR (RPO=0,RTO~0) is required, then Tier-1 is EMC/XP and NetApp/FC is downgraded to Tier-2 and so on.

Very much enjoy reading this blog.

I believe your initial premise is off base by the slightest mention of any specific “product” in the definition or discussion of tiered storage. Tiered storage has absolutely nothing to do with any specific product.

Tiered Storage Definition = Service Level Agreement SLA Vs Total Cost Of Ownership (TCO)

Where,
SLA = feature/functionality (availability, reliability, performance, DR RPO, DR RTO, data mobility, provisioning, reporting, monitoring, etc.)
TCO = acquisition + interoperability + facilities + operation + unplanned outages… etc.

Some additional points as related to tiered storage:
• It’s the human nature to get dominated by acquisition cost or “marketing perception” and think of a “product” as a “tier”.
• TCO analysis is difficult for most users because they must justify soft costs.
• There is diminishing return for too many tiers… number I don’t know. (Another soft cost that is hard to quantify)
• Heterogeneous storage is not tiered storage.
• Heterogeneous storage is very very bad. Every storage operating system added to the environment will add millions of lines of code and along with that a multiple of interoperability permutations. Therefore, the number of bugs (i.e. outages) introduce are exponential not linear.
o Some tiers may require a specialized storage OS, such as, block or archive. Just do not introduce multiple storage OS’s that do the same thing.
• The fewer lines of code in an environment the fewer bugs; Therefore, fewer issues/outages; Therefore, less operational care and feeding; Therefore, significantly lower TCO.
• One business outage can overwhelm acquisition savings many times over and no one documents this loss after the fact.

Hi, Barry

I tend to agree with the posters who are arguing for an abstraction level (SLA, cost based) about the technologies or even the descriptive names you've provided.

Lots of good reasons to consider this.

First, the capabilities of the underlying technologies change over time. But 99.9999% available doesn't, does it?

Second, the conversation with the business units (or application owners) needs to be around SLA/cost, versus RAID 6, striped, etc.

I think you know this, but EMC has done hundreds, if not thousands, of engagements with customers around service level catalogs.

It may not be a perfect practice, but it has definitely been valuable, IMHO.

Cheers!

Post a comment

Comments are moderated, and will not appear on this weblog until the author has approved them.

If you have a TypeKey or TypePad account, please Sign In

anarchy cannot be moderated

by: barry a. burke

  • search blogs by emc employees:

    search this blog only:

    View blog authority

     
     posts feed
          Subscribe by Email
     
     comments feed
     visit the anarchist
          @ home

recommended reads

blatant blogvertizing

Get TypePad - Start Your Blog

general housekeeping

  • disclaimer
    The opinions expressed here are my personal opinions. I am a blogger who works at EMC, not an EMC blogger. This is my blog, and not EMC's. Content published here is not read or approved in advance by EMC and does not necessarily reflect the views and opinions of EMC.