Posted by: Jenny Laurello
Clinical Pathology, Laboratory Information Systems, LIS, Pathology
Guest post by: Bob Killen, Systems Admin / Architect, University of Michigan Hospital
Pathology is a multifaceted beast covering a wide range of disciplines. These disciplines can be broken down into two general areas of interest: Clinical and Research. In part one I’ll focus on the clinical aspects of being a system administrator in Pathology Informatics. In part two, I’ll dive into the demanding side of research.
In support of Laboratory Information Systems (LIS)
In many ways, Clinical Pathology functions like a smaller hospital within a hospital. Not only does it have a patient-visible presence in the form of phlebotomists and nurses focusing on point of care, but also a very complex layered back end. A suite of instruments for processing lab results, a collection of imaging systems and even its own version of an EMR in the form of a Laboratory Information System that handles the collection and dispersion of results may sound like a lot; however, it’s not even scratching the surface of all the varying areas that Pathology influences. Pathology is the central hub for diagnostic medicine as the primary processor of lab results. If you couple that thought with the ever increasing migration away from paper-based services and computer aided lab testing, and also the need to maintain 100% up-time becomes essential to clinical patient care.
Unfortunately, this truth is many times an afterthought in application and workflow development, more times than not resulting in designed single points of failure. Now, before I anger too many people out there with that statement, it IS starting to change. More systems are being designed properly from the ground up with the ability to scale both in performance and redundancy. However, legacy systems or other applications that have been created without proper oversight must still maintain around-the-clock operation. For these, we sys-admins come into play, who build out an underlying supportive infrastructure able to maintain these less fortunate applications.
My general rule for approaching these situations boils down to a small series of questions:
– Is there a single point of failure?
– If so, can it be eliminated? If not eliminated, then mitigated?
– How can the solution be simplified or improved?
I’ll run through an example that I know lies around, lurking in many hospital datacenters out there: The legacy physical system, an older windows system running SQL Server 2000 acting as an instrument interface.
1) Is there a single point of failure? Yes, the database is locally installed and vendor does not support migrationto a dedicated database cluster or hot backup.
2) If so, can it be eliminated? If not eliminated, then mitigated? This is where things can get complicated. As every environment is different, I’m going to be rather sparse with the actual details and make a few assumptions. In any case, with this legacy system, failure cannot be completely eliminated with too many dependencies local on the system. The process roles cannot be broken apart into separate application and database pools. It is now time to move to mitigation. First and foremost (if possible) virtualize the system, moving it into a high-availability pool. If the application cannot be virtualized, there are other business continuity options available. However, the benefits of a virtualized infrastructure for these types of systems far outweigh the other possibilities out there. Most virtualization platforms can satisfy both basic availability and backup requirements.
Critical failure mitigation at the software level can vary depending upon your backup strategy and instrument vendor. Some instruments allow for a cache of records to be stored locally on the sender device and can be replayed in the event of a failure. In this situation, I would use a full VM backup scheduled within the window as allowed by the cache. If the instrument does not have such a feature, setting up a full VM backup in conjunction with transaction log shipping or some other database level backup would be the next best thing.
3) How can the solution be simplified or improved? At this point I’d say the solution is fairly simple. Redundancy and backup have been covered; however, there is still plenty of room for improvement. We’ve yet to deal with security and change control, both of which are quite critical to maintaining near 100% up time.
Continuing with this example, being a legacy instrument interface system, it more than likely would not meet the security standards of today. I’ll go with a worst case scenario situation. The vendor has explicitly stated that it will not support the interface if a virus-scan or firewall product is installed.
It is absolutely imperative to minimize the threat window of these types of systems. If your internal network is compromised, a system such as this is a perfect target for exploitation. While the vendor may not support a firewall running on the system, we can still build one out in front of it. This is again where a virtualized environment can prove to be quite handy. It’s generally trivial to build these types of systems where physical networking would make it complicated to deploy.
There are quite a few firewall products out there, both in the form of pay-for-services from a variety of vendors, or free open-source offerings that are relatively easy to deploy. As long as the firewall has the standard five-tuple based filtering rules, it should be able to suffice for our application. Once deployed and a proper rule set has been generated, the threat window of the system has been decreased dramatically. As an added bonus, many of the firewall appliances also give you VPN functionality, enabling secure remote connections. Say, for example, to an offsite lab or to the vendor themselves for support.
Before I continue with change control, I want to note something if you haven’t caught it already. By putting a firewall in place, a new single point of failure has been introduced. However, most, if not all, of the decent firewall systems out there do support some form of redundancy, either in a load-balanced or hot-spare configuration. This all may seem overly complicated and a daunting task to set up, but I will say there is a high up-front investment in time. However, once you’ve got the basic setup complete it becomes trivial to replicate across multiple systems.
If this sort of topology doesn’t work for your deployment — or if you’d prefer to keep things simpler (and a short period of downtime is acceptable for the system) — another possibility would be to deploy a single firewall and just keep an image or back-up of the configuration available to make redeployment of the firewall quick and simple. If using a virtualized environment, this task can be scripted fairly effectively, and is, in essence, how VMware’s own vShield Edge firewall/router appliances work behind the scenes. For the sake of simplicity, this last option shall be what we move forward with for this example.
Now that the system is effectively locked down, it’s time to move on to the last hurdle of change control. This can become a nightmare depending on the level of recertification or validation that needs to go on when a system needs to be updated. The preferred method would be to use the standard 3-layer model of Development, Certification (or Test) and Production.
Development: It is the playground. Change happens often, the system can be broken, it’s expected that the system will die and will need to be restored back to a base image. Once development has solidified; the general best practice is to wipe the system and use a fresh build or clone of Certification/Production, using a system monitoring tool to watch what is affected when the gold application build or patch is applied. If all is functioning well, it can be bumped up the chain to Certification.
Certification: This is a clone of Production, and changes are applied here as if it were Production. This is generally where all the validation takes place. With an interface system, it can mean splitting the Production interface feed and monitoring a random selection of results to ensure they match what is in Production, or what will be expected to be in Production once the changes are applied. Depending on the interface or system, and how it touches patient information, documenting these results becomes crucial for management, not to mention maintaining accreditation from the College of American Pathologists.
Production: This is the live system. No changes should directly be done to this system unless they’ve passed all the tests in Certification. Unexpected changes here on interface type systems may have drastic effects on downstream analysis that could directly impact patient care.
While this is the preferred method, many times it is not possible to adhere to. There are a variety of reasons, ranging from licensing complications to not having the resources available to create such an environment. It is here again where a virtualized environment makes the task easier to deal with. While a full three-layer environment may not be supportable, or even having a Certification and Production environment available at all times, you can work around this by cloning the full production topology (firewall and all) and bringing it up as a certification system when an update or change has been provided from the vendor, then deleting it once all testing is complete. This is contingent on licensing, of course.
For the legacy system, the clone option is likely the best route. The vendor will not likely be issuing any updates to the product, and the only real changes will be keeping the system up to date on Operating System level patches. With it behind a firewall and strictly allowed to talk to only a few systems, the window of exploitation is small enough that patches should be limited to critical items (such as MS08-067), or possibly rolled into a quarterly scheduled system update.
The legacy system, now virtualized and placed in an HA pool in conjunction with a firewall, with change control strategy in place meets our guidelines outlined above. Complete failure cannot be eliminated, but it has been mitigated to the best of our ability.
In years past, the only true critical system was the central LIS. Many of the lab results were either hand-entered into a terminal or transmitted from the instrument simply enough that the LIS could handle the processing itself. Those days are long gone. With so many ancillary systems now acting as the pre-processor of patient lab data before being stored in the LIS, clinical care cannot be maintained without keeping these systems functioning.
Much of the above has been an over-simplification of the actual process of failure mitigation, but it does give you insight into the general thoughts and methodologies that are used when we approach a system to improve availability. W Pathology is a multifaceted beast covering a wide range of disciplines. These disciplines can be broken down into two general areas of interest: Clinical and Research. In part one I’ll focus on the Clinical aspects of being a system administrator in Pathology Informatics. In Part two, I’ll dive into the demanding side of Research.