Effective data replication strategies for disaster recovery

For mission-critical applications, data replication is key to recovering data during a disaster. Each data replication strategy comes with its own benefits and drawbacks.

This article can also be found in the Premium Editorial Download: Health IT: Storage virtualization strategies for health care:

Health care organizations are generally required to ensure that their systems and the patient data that they contain remain highly available, and that measures are in place to prevent data loss. There are a number of different technologies that can be used, many of which revolve around redundancy and various forms of replication. This tip examines five common data replication strategies.

One way health care providers are insuring that patient data remains accessible is through the use of server redundancy. If a server failure occurs, then another server is available to pick up where the failed server left off.

There are a number ways to implement server redundancy. In virtual data centers, server redundancy is most often implemented through clustering. Although it is possible to directly cluster some virtual machines, this action is most often performed at the hypervisor level. Doing so makes it possible to cluster virtual machines that would not otherwise support clustering.

Although clustering should be considered an essential for any organization that is operating virtual servers, there is one major drawback that must be addressed. Most of the clustering solutions that exist for virtual data centers depend on the use of shared storage. This means that all the nodes within the cluster are connected to the same shared storage mechanism, usually a logical unit number (LUN) on a storage area network (SAN).

This means that, if not properly implemented, the shared storage pool can become a single point of failure. It therefore becomes critical to implement multiple physical connections between each virtualization host and the storage pool and to design the shared storage pool to be resilient against disk failures.

Among the common data replication strategies, disk-based backup is designed specifically to address the shortcomings of tape-based backup, which has been used for decades.

One big problem with tape-based backups is their potential for data loss. For example, imagine that an organization performs a backup every night at 11 p.m., with the backup complete at midnight. After that, there are 23 hours until the next backup is run. If a failure were to occur, any new data that accumulated within that 23-hour span could be lost, since it has not yet been backed up.

A disk-based backup, on the other hand, can run continuously if used in conjunction with continuous data protection software. In this case, data can be backed up almost immediately.

The biggest disadvantage to disk-based backups is that the data remains on site. If they are used as the sole backup mechanism, then there is no protection against disasters such as fires or hurricanes.

To get around the limitation of disk-based data replication strategies, health care organizations often combine them with tape backups by backing up the disk backups to tape. As an alternative, some organizations replicate their disk-based backups to either a remote site or a cloud service provider.

Another technique commonly used to safeguard data is replicating data to multiple storage systems. Such data replication strategies assure health care organizations that data will remain available in the event of a storage system failure.

Synchronous replication is used when data must be 100% consistent at all times, across all replicas. The basic idea behind synchronous replication is that write operations typically occur on a single server. However, write operations are not considered to be complete until the data has been written to the other replicas and confirmation of the write is sent back to the server containing the original copy of the data. Until this confirmation is received, the server storing the original copy of the data does not commit the write operation to disk.

This technique ensures that all replicas remain completely consistent. However, this consistency comes at a price -- it adversely affects performance due to the inherent latency involved in waiting for replicas to confirm write operations. This, in turn, means there are also limitations to the distance over which such replication can occur.

Another disadvantage is that synchronous replication solutions can be tough to come by for file servers. Although synchronous file server replication solutions do exist, the technology is used almost exclusively for database servers.

Asynchronous replication is the opposite of synchronous replication. When the primary replica performs a write operation, that operation is committed to disk immediately. Only after the operation has been committed to disk is the data sent to other replicas.

Among the advantages of asynchronous replication is that it faster than synchronous replication, since the primary replica does not have to wait for confirmation of receipt from the other replicas before it can complete write operations. Asynchronous replications can also occur over longer distances, while other data replication strategies cannot.

Effective data replication and disaster recovery strategies should make use of several solutions that complement each other's strengths and weaknesses.

The downside is that it does carry a potential for data loss. If the server on which the data was originally written were to fail, then any write operations that have occurred on that server but that have not yet been committed to the other replicas are lost.

Both the synchronous and asynchronous data replication strategies assume that replication is being performed in a one-to-many architecture -- in other words, one primary server replicating all of its data to multiple replicas.

When all the replicas are located in proximity, it is common for the primary replica to send data to all other replicas simultaneously. In larger networks, however, multi-hop replication is among the data replication strategies that may be used.

To show you how multi-hop replication works, imagine that you had a replica set consisting of three servers. Server A is the primary server responsible for replicating data to Server B and Server C. In a multi-hop replication topology, Server A would replicate data to Server B, and Server B would replicate the data to Server C.

Although multi-hop replication tends to work well in larger environments, there are a couple of important things to consider. First, synchronous replication tends not to work very well with multi-hop replication topologies because there is so much latency involved in the multi-hop process. Don't get me wrong. You can use synchronous replication in a multi-hop topology. It's just that the replication process tends to be slow and inefficient.

The other thing you need to remember is that multi-hop replication topologies are sometimes combined with standard one-to-many topologies, especially when it becomes necessary to replicate data across site boundaries.

In situations such as this, one server within each site would typically act as a bridgehead. The primary server in the first site might replicate the data to the bridgehead server in that site. The bridgehead server would then replicate the data to a remote bridgehead in the secondary site. That bridgehead server might then use a one-to-many architecture to replicate the data to several replicas within the secondary site.

Ultimately, each of the data replication strategies discussed here has its own advantages and disadvantages and, on its own, may not be enough to satisfy a health care organization. Effective data replication and disaster recovery strategies should make use of several solutions that complement each other's strengths and weaknesses.

Brien M. Posey, MCSE, is a Microsoft Most Valuable Professional for his work with Windows 2000 Server and IIS. He has served as CIO for a nationwide chain of hospitals and was once in charge of IT security for Fort Knox. Write to him at editor@searchhealthit.com.

This was first published in September 2011

Dig deeper on Electronic health care systems, data centers and servers



Enjoy the benefits of Pro+ membership, learn more and join.