Seasoned network administrators know all too well that the volume of an organization's data grows exponentially...
over time and that measures must be taken to keep the data from outgrowing the organization's IT infrastructure. This already difficult challenge is further compounded in health care organizations by the requirement to retain electronic health records and other data for a prolonged period of time. As such, health care organizations are often forced to look for creative ways of dealing with long-term data retention.
It might at first seem that the obvious solution to the problem is to implement a data archiving solution. However, in health care organizations an archival solution is only appropriate for certain types of data. For example, a health care organization could safely archive old accounting data because that data will theoretically never change. The same cannot necessarily be said for other types of data. For example, even old patient data could potentially need to be updated at any time, which means that an archiving solution that is meant to preserve data while ensuring that the data remains unchanged will probably not be an adequate solution for long-term storage of electronic health records.
When designing a long-term storage data solution for patient information, there are two main factors that must be taken into account -- accessibility and cost.
Accessibility is perhaps the more difficult of the two challenges. Because patient health records (even old ones) could potentially need to be accessed at any time, simply copying old records off to tape and storing the tapes in a vault simply is not an option. Patient health data needs to be able retrievable within a reasonable amount of time.
Not only must patient health data be retrievable, but in many cases it must also be updatable. If a patient who has not been seen in quite some time suddenly comes into the facility requiring treatment, the facility must be able to retrieve the patient's health records in a timely manner and to be able to make updates to those records. While this concept may seem simple and obvious, it has deep implications when it comes to storage planning. The requirement for patient health records to be updatable rules out the possibility of being able to write old patient data off to read-only media.
As with any other IT project, cost is also a major consideration. The easiest way to make sure that data remains easily accessible is to store all of the patient health data in the same place. The problem with this is that storage (especially SAN storage) tends to be expensive. Furthermore, disk-based storage has a finite capacity that might be inadequate for the long-term storage of patient health records.
For some organizations the use of cloud-based storage might prove to be a viable option. However, storing large amounts of data on a public cloud can sometimes be cost prohibitive. Cloud storage providers typically bill organizations on a monthly basis based on the number of gigabytes of storage that are being consumed. This monthly cost might be lower than the cost of storing and maintaining aging data on site, but over the long term the cost of using a cloud storage provider almost always exceeds the cost of storing the data in-house. Even so, the security and accessibility cloud storage providers offer may make using such a provider worth the cost.
Strategies for long-term storage
Regardless of whether an organization decides to store data on premise or in the cloud, the organization must adopt a sound strategy for the long-term storage of aging data. The strategy must be cost effective, but in the case of health care organizations must also ensure that the data remains readily accessible. There are several different approaches that can be taken to address these issues, but the most appropriate strategy will depend on the organization's budget and the type of data that is being stored.
In any case, the health care organization would likely use a variation of common data lifecycle management techniques. The difference, however, is that whereas data lifecycle management focuses on eventually purging expired data, the techniques that a health care organization would likely use focus more heavily on moving aging or infrequently accessed data to less expensive storage.
Hierarchical storage management
One technique that can be used to migrate data to less expensive types of storage is known as Hierarchical Storage Management or HSM. HSM is only appropriate for use with file data. In an HSM architecture, storage is defined as a series of tiers. The highest level tier represents the organization's primary storage. This is typically the most expensive, highest performing type of storage that the organization uses. Policies are used to migrate files to less expensive storage media based on when files were last accessed. For example, a file might start out on a solid-state drive (SSD) and then eventually be moved to a traditional hard drive, and finally to an optical disk. If a file that has been moved to a lower tier storage level is accessed, then the file can be automatically moved back to a higher tier storage level so that access to the file can occur more quickly in the future.
Active archiving works similarly to Hierarchical Storage Management, except that active archiving is designed for use with databases rather than with file data. As is the case with HSM, policies allow infrequently used data to be automatically migrated to less expensive storage mediums. The difference is that this migration occurs at the database record level rather than at the file level. The nice thing about this approach is that because the technique is specifically designed for databases, relational data is retained when the records are moved. Active archiving allows seldom used records to remain accessible and fully functional even though they might no longer physically reside on the same storage device as the most frequently used database records.
Regardless of whether an organization is using HSM or active archiving, the key to making the technology work is careful hardware planning. Either technology is capable of moving data among storage tiers, but it is ultimately up to the administrator to decide what hardware should be used within the various tiers.
Top tier storage is almost always made up of high performance hard disks such as SSDs or 15,000 RPM SAS drives. But organizations are free to get a bit more creative with lower tier storage. SATA drives and optical drives are popular choices. However, some organizations have begun using cloud storage instead.
Another possible choice is to use a technology called MAID, or Massive Array of Idle Disks. The idea behind MAID is that the disks containing older, seldom accessed data remain idle most of the time, thereby saving power and wear and tear on the disks themselves. If a piece of data needs to be retrieved the disks can be quickly spun up, the data is retrieved, and the disks go back to an idle state.
As health care organizations accumulate more and more data they must often look for creative ways to store that data. Thankfully, there are a number of different technologies that lend themselves well to this task.
Brien M. Posey, MCSE, is a Microsoft Most Valuable Professional for his work with Windows 2000 Server and IIS. He has served as CIO for a nationwide chain of hospitals and was once in charge of IT security for Fort Knox. Write to him at email@example.com or contact @SearchHealthIT on Twitter.
CIOs make a common mistake with health data storage