Using tiered storage architecture for health recordsDate: Aug 09, 2011
In tiered storage architecture, data is assigned to different types of storage devices depending on its importance to an organization. Mission-critical information is stored on the fastest systems, and archival data that is rarely retrieved goes on the slowest, least expensive storage.
In this webcast, Marc Staimer, president of Dragon Slayer Consulting, describes the basics of tiered storage architecture and demonstrates how health care organizations can set policies to automate the process of moving data to different tiers.
Let us know what you think about the video; email firstname.lastname@example.org.
Read the full transcript from this video below:
Using tiered storage architecture for health records
Brian Eastwood: Hello everyone. This is Brian Eastwood, the site editor for SearchHealthIT.com. This expert webcast is one installment of a SearchStorage.com Health IT interactive classroom that covers storage strategies for the health IT professional. Additional installments include a podcast on object-based storage and tips that cover storage virtualization and online storage services.
I am joined today by Mark Staimer, the president and CDS of Dragon Slayer Consulting. Mark will be providing an overview of the concept of tiered storage and will also discuss the pros and cons of automated tiering policies. Thank you for joining us Mark, and please take it away.
Mark Staimer: Thank you Brian. As Brian said, my name is Mark Staimer. I am the president and Chief Dragon Slayer of Dragon Slayer Consulting. I have over 13 years experience as a consultant working with vendors and end users. Vendors are the ones who typically pay me, end users do not, unless they want me to review an RC, write an RC, or go on site. What I do with vendors is I help them market more effectively. That means that I help them understand the problems they should be solving or the problems they do solve. I also provide analysis at trade shows, I publish consistently at Tech Target and other trade magazines, and I have over 31 years experience in the industry. If you want to get a hold of me, you can email me or phone me. I respond to every e-mail and every phone call.
Let us talk about what we are going to talk about today, which is what is tiered storage? And more along the lines of what it does and how it works. Then we are going to talk about how non-automated tiered storage has a limited value, whereas automatic tiered storage works much better, and how that works. We will also talk about who does automated storage tiering and some best automated or dynamic storage tiering practices.
I like to start every webinar or presentation with a story; it helps keep things straight. One day a king was dying, and he wanted to leave his son in charge, or become the next king, but there was a problem. He called his son before him and said, ‘Son, you are my heir, and you know I am dying, but before people can accept you, you must pass the test of three-fold.’ His son says, ‘Alright, father. I am willing to do whatever I must to lead the kingdom to prosperity.’ ‘Good, because the test is not something that just anyone can just pass. First you must drink three gallons of the local wine.’ The son thought this could not be too hard because he has been going to college. And the father said to him, ‘Unfortunately our wine has been known to kill foreigners, and you have been gone a long time. Second, you must remove an abscess tooth from a Bengal tiger.’ That scared him a little bit. ‘Third, you must make love to the most beautiful, passionate woman in the entire kingdom.’ He thought, ‘That could not be too hard.’
He started the first test. They take him to a room, they give him three gallons of the local wine, and he gets so sick. We will not go into the details, let us just say he is comatose for a few days, and it takes him two weeks to recover. Finally, he says ‘Alright, father. I am ready for the second test.’ They take him to a room, a sealed room, with heavy cast iron doors with a big crossbar on it. They lift the cross bar, they open the door, they throw him in, and they say, ‘Good luck.’ They slam the door closed and put the cross bar down. They hear this growling and these loud thumps, and they hear this loud screaming and loud roaring. Finally, they hear a ‘thump, thump, thump,’ then silence. The king fears for the worst. 30 minutes go by, and they hear this pounding on the door: ‘pong, pong, pong.’ They open the door, and there stands the son, ear hanging by a thread, eye swollen shut, limping badly, and clothes in tatters, half-naked. He walks up to his father, and he says, ‘Alright, father. Tell me, where is the lady with the abscess tooth?’ The purpose of the story is you have to keep things straight, or things do not always work out the way you expect.
What is tiered storage? It is a storage system hierarchy that is typically based on service requirements. In other words, every piece of data does not have the same value; every application does not have the same value. Some applications are more mission-critical than others. For example, your email is probably mission-critical, your mp3 music, not so much. Your invoicing, client records and health records are mission-critical. Information about the weather or information about your company, your industry, not so much. Every piece of data has a different value, so putting all that data on the same piece of storage does not quite make sense, especially if that storage is expensive. The key philosophy behind tiered storage is to align the value of the data with the value of the storage, and the characteristics of the storage, which could include security, business continuity, performance, data protection, retention, compliance, and of course, cost.
This requires some mechanisms to place the data. It could be static, in other words you assign it up-front; this application goes to that storage system, this application goes to that storage system, or at least these lower class drives or these lower performing storage, whatever the case might be, but you are statically placing it. Another could be staged, where you are doing a batched movement, where you are moving data manually to different storage tiers, then at a certain time of the month, you take it from this storage system to that storage system. One other aspect about data, as it ages it tends to lose value. Historically, we know that data's most recent data is most commonly accessed in the first 48 to 72 hours. After that, it drops off a cliff; it's rarely accessed. We know as data ages, you may want to move it.
Automated or dynamic, which is active data movement, also known historically as hierarchical storage management, or HSM, or Information Lifecycle Management, or IOM. Based on policies, circumstances, time, response time and time of month, data gets moved. HSM is automatically moved based on one characteristic only, that is age -- the time it was last accessed. These are what when we mean when we talk about tiered storage.
A better way to explain tiered storage is that not all storage systems, or storage devices within the system, are the same. You have solid-state drives, typically flash, but can be DRAM, high performance drives. You can have high performance disk drives which can be SAS or fiber channel, which are typically 10,000 or 15,000 RPM drives. You have SATA Drives, which are lower cost and lower performing, typically 7,200 RPM, and now you have near-line SAS, which are the same performance of SATA but have better data protection characteristics: dual porting, a variety of different features that actually makes the data a little bit more resilient, but about the same price as SATA. The biggest variance among all these performance characteristics of the media within a storage system is cost. We were talking about cost earlier. Do you want to put all of your data on the highest cost media? Probably not. Do you want to put your mission-critical on the lowest cost media, which may not be as resilient? Probably not. This is where this variance in the different media comes into play, on both characteristics and cost, and aligning that data based on performance cost and other characteristics becomes important.
Tiering aims at balancing the data in the storage. Think of it in these terms: data value equals what storage value? Part of that can mean the frequency of access, age, and the application mission-criticalness that we talked about, but it is all about alignment, and this can be within a storage system or between storage systems. Most vendors who sell automated tiering software do so within a storage system. You are buying this one storage system, you have different tiers of storage within that storage system, and the software is designed to move that data within that storage system but not between. Times are going to be changing, and this is how it primarily today.
Non-automated tiered storage really has very limited value. At the end of the day it is manually labor intensive. You have to actively do this; it requires human intervention. It consumes a lot of time, it is error-prone, it can be extremely frustrating; ultimately, it is not effective. All these things people tend not to do it. If you tend not to do it, you tend not to use it. Therefore, all the advantages go away.
How about automated storage sharing, how does that work? As I said, it is usually within a storage system, and there are exceptions to that. I will talk to that in a moment. Typically, it is based on data age, the age since last accessed, last read, or last modified. It can be a variety or all of those; it can be just on the age of the data itself. How old is it since it was created? Sometimes it is based on calendar; regardless of the age, at this point in the month, this data moves over here, I will give you an example of that. You close out your books every month, so based on closing out the books, are you going to keep that data resident in high performance storage with low response times, very fast response times, or are you going to move that to lower cost, lower performing storage because you do not need that? It is probably going to be the latter. How about occasionally you need dynamics. There are certain circumstances where you cannot predict the load on your storage system or on your applications. If you have a dynamic web interface or if you have applications where you have suddenly done a promotion, and you are getting a lot of traffic, a lot of activity that affects the services you provide, that can have a huge impact on that application. Which means, do you want that data going to slow storage or fast storage? Having the ability to dynamically read what is going on and move that, could be a good thing under those circumstances.
You can do the movement of the data itself is based on a file or a LUN, a LUN is a volume, or a sub-Lun, a piece of it, certain number of blocks, or you can do it on blocks themselves. A variety of different methods, this is the granularity of data movement. Each vendor has different granularity, some can move it in 4k blocks, others in 8k blocks, others can move it only on a LUN-basis, and others only on a file-basis, and now you can even do it on an object basis, which is a little bit different than either of those. Objects include the metadata about the data with the stored object.
What is the value proposition? It is actually tremendous to have proven value to IT, and that is for Gartner, IDC, 451 group, and ESG. All these analysts firms have gone out and validated the value proposition. Generally, it can save millions of dollars in CAPEX, that is the capital expenditure, and OPEX, the operating expenditure. Payback is typically less than a year, which makes this a pretty straightforward value proposition. It enables cost-effective use of very high performance flash, SSDs, or SSD appliances, which are also very high cost -- you do not necessarily want to put all of that money in there and be saving birthday cards for junior on them. It reduces or eliminates a lot of high performance hard disk drives, which are more expensive than the lower performing. You can have much fewer drives, and if you have fewer drives, you are spending less on power and cooling, you have less rack space, and less floor space. All that reduces cost, so it has a very good value proposition, but there is no free lunch, there always trade-offs and you need to know that. I am not a cheerleader; I look at things very pragmatically. Even though there is a very good side to automated tiering, there is a downside too. A good placement decision today will be probably not be a good decision tomorrow because things change -- we live in a dynamic world which, means that you are going to have to revisit this periodically to decide whether the policies you set when you initially implemented are the policies that you need to have today. In fact, you need to set this up on a structured predetermined basis to make sure that the policies do not need to change.
Software automation, where you are doing automated tiering, tends to move data one way: down. From high performance to low performance, rarely does it work the other way, from low performance to high performance. Some automation tends to increase disk fragmentation; there is one vendor who will move it from up to down, basically they have implemented an HSM in their storage, then from moving it back, as soon as you access it, you are bringing it back into the higher price storage, which tends to cause disk fragmentation because it runs into the cycle again on the policy of moving it back down, based on age. Some automation will actually shorten your SSD life. SSD flash drives have a limited number of rights. This is one of the issues with flash. Granted, it has gotten seriously better over the last few years and much more enterprise capable than ever before, but it does have a life, and the more you utilize it, the more you shorten that life. Ultimately, auto-tiering software tends to be a bit pricey, so you will save money, and you will get a payback, but you are going to spend money upfront to get that payback.
Who provides automated storage tiering? As you can see, I got a significant list of different companies and products that do this, and it is growing all the time. Some are really interesting, in the sense that they are dynamic like Zyotech. Others are multi-system, like Avir, which works with multiple file-based storage systems, not just them; it works with multiple vendors. Others are quite useful within the system, like EMC, Dell, Hitachi, and IBM -- they have very good tiering. And NetApp now has very good tiering, so they are all pretty good. The only one in the object space that has tiering today is Scality, and that also is very good. They all have very valid capabilities that can be utilized in moving your data to where it is most cost effective and performance effective.
What are your best practices? Know your applications and data value. Do not assume that everything, one size fits all, it does not. It's a terrible waste of resources if you do that -- you are throwing money away. Set policies that match that value: age, frequency of access, time of month, or whatever is important to you. Set your policies based on that, and review it periodically, in a structure periodic review basis, whether once a month, once a quarter, once every six months, whatever works for you, but set up that policy. The only constant that you can remember is that change will be occurring; every single day change occurs. You want to stay ahead of it.
Finally, consider using flash cache, instead of or in combination with. What is flash cache? Flash caching is a really viable alternative or enhancement in automated tiering. Basically, you are using the flash cache, which is much lower cost than DDR. In front of the disk drive, you are writing to the flash cache, and it is using this sophisticated algorithm to determine what data should go into cache and what data should not and will automagically move it to the flash cache when it should. It is looking at the data as it is occurring in real time, and saying, ‘Those rights, that data should reside here not on the hard disk drive, initially.’ That I/O performance will increase while I/O latency decreases as it starts moving those rights, those reads, and the hot data to the faster media, and it does this in real time. It is strictly observing which data is frequently accessed, and it moves it to that faster media without human intervention. Any time you can avoid human intervention with these types of things is a good thing, because you have less chance of errors. Multiple applications can benefit from that performance increase, and ultimately, the back end can be purely lower performing disk drives, lower cost disk drives. An interesting philosophy.
The value prop here is again, has the same proven value to IT that the automated tiering does. It will generally save millions in storage CAPEX and OPEX, again, and payback again, is once again less than a year. Again, enables a high use of flash and it reduces or eliminates hard disk drives, and you have fewer of them -- same value prop. What are the differences? There is a downside here, just like there is with other tiering. Although this is dynamic and it is going to provide the higher performance based on your I/O patterns at any given point in time, some only provide write caching, which means it does not have an impact on read. Write caching is where you are writing to the flash and then it writes the hard disk drive, but because the flash, it tends to be faster, it is speeding up your writes, then writing through to the hard disk drive and freeing up the cache. Since more data is written to that flash, the lifecycle is reduced; we talked about this previously, because it is always utilized. Typically, what happens with flash is the performance you get initially, is going to be the best performance you get out of flash. Over time, after it fills up, performance decreases as you write because you have to erase, which is a destructive process in flash, before you can write, which slows down your writes. It tends to be more expensive than solid state disk drives per gigabyte; it limits tiering to no more than two tiers. What you are seeing is that many vendors who were automated tiering vendors, are offering flash caching. Flash caching vendors are now offering automated tiering, because they each have their positive impacts on performance, and using them together can be more effective than picking one or the other -- it is not either/or, it is both.
There are other alternatives, and as I said, I am not a cheerleader for any technology; I am a cheerleader for what works for you. Single tier storage systems, in fact today there are solid state disk-based flash storage systems that have no hard disk drives whatsoever. Nimbus Data and Solid Fire are two that come to mind, and they scale to very large numbers -- as high, today, as 250 terabytes, and not too far into the future in the pedabyte. The key thing about it is they are selling their solid state disk drives at the same cost of a 10,000 RPM fiber channel or SAS drives. Interesting philosophy and something worth exploring. That is a case where you are putting everything on the same storage tier, you are not tiering.
If the cost is low enough, do you need to tier? It is an interesting question. Hard disk drives only, when you are doing a lot of them and doing what is known as short stroking. Short stroking can be pretty cost effective in providing high performance for those applications that need it, setting up a tier that way, but with no solid state whatsoever. PCIE flash cards are cards that run on the server down on the storage system. The problem with PCIE flash cards is that they are not shareable by other storage systems or other servers, excuse me. They are stored only for that server. They will speed up databases, especially relational databases, very effective at that. As I said, each has its pluses and minuses and can be a valuable component in the overall storage system ecosystem.
In conclusion, storage tiering can seriously reduce your storage cost, and in an era where storage costs are going up faster than anything else in the data center, taking a bigger chunk of the data center budget, this is a very good thing. Flash cache can do the same as automated storage tiering, and each has strengths and weaknesses where combining them can make a huge amount of sense. Single tier high performance storage is also an option if the costs are low enough. As I said, ultimately, one or a mix of these solutions makes the most sense.
I like to finish off webinars with another story, just like I like to start them. One day, three friends decided to go fishing. It was beautiful day, the sun was shining, cool brisk spring morning. They are out in their boat on the lake. Two of them are Christians, one of them is an Atheist, one is a Catholic and one is a Protestant. It is the middle of the morning, beautiful day again. The Catholic stands up and says, ‘Guys, forgive me. I left my favorite fishing lure back in the car, I will be right back.’ He jumps out of the boat, runs across the water, goes to the car, gets his lure, runs back across the water, gets in the boat. The Atheist is looking at this thinking, ‘I must have had too much to drink last night. I must be daydreaming, it did not happen.’ About 30 minutes go by and the Protestant stands up and says, ‘Guys. I will be right back. I am out of bait; I will be back in a jiff.’ He jumps out of the boat, runs across the water, gets his bait, runs back across the water, and gets back in the boat. The Atheist is looking at this thinking, ‘Heck, if they can do it, I can do it. There must be something about the water that I am not aware of. Gentlemen, I am going to go get some coffee, do you want anything? OK. I will be right back.’ He jumps out of the boat, ‘splash,’ flailing around. His two friends are leaning over the boat trying to pull him back in the boat, and the Catholic turns to the Protestant and says, ‘Do you think we should have told him where the rocks were?’ Now you know where some of the rocks are.
Brian Eastwood: Thank you, Mark, for that informative presentation on using tiered storage in a healthcare setting. As a reminder, this webcast is one installment of the SearchStorage.com Health IT interactive classroom. Please check out the other resources in this classroom: A podcast on object-based storage and tips that cover storage virtualization and online storage service.