Warakorn - Fotolia
Healthcare organizations are always working to improve the quality of their care and the efficiency of their business operations. Data analytics for these clinical quality improvement efforts require access to a lot of data for determining baseline performance, detecting trends and patterns, and simulating potential outcomes of new processes and workflows.
As I was writing recent SearchHealthIT articles about HIPAA privacy and security, I started seriously thinking about my own practices, and the practices of my team, regarding the access to and use of protected health information (PHI).
As an analytics professional working within a large healthcare organization, I am required to access and use the PHI of many individuals. Most often, this occurs when I am extracting and analyzing data required for clinical quality and performance improvement efforts or when I'm assisting with critical incident reviews. Like most healthcare analytics professionals, most of the PHI I access is done through health IT systems such as electronic health records or via data warehouses.
The analytics tools that my group uses or builds typically generate anonymous aggregate data, such as visit counts, times analysis, and other data that has been summarized or crunched. For example, performance dashboards, quality reports (such as statistical process control charts), and even predictive and simulation models can be developed with data that is completely anonymous. When we're working with extracted data for other purposes, the data is not always completely anonymous. For example, personal information required for use in a clinical environment may include names, birthdays, or other identifying information -- especially in cases of critical incident reviews or infection control contact lists.
When specific patient data is useful
One of the questions that I have been asking myself, my team and other analytics experts who may work with data that contains PHI is this: Is there ever any need for me to access and use data that is not completely anonymized? The quick answer is yes; there will always be defined circumstances where identifiable health-related information will be needed to improve clinical quality.
The analytics portal in which my team does the majority of its work contains a few reports that are used for auditing and critical incident review purposes that must have the patient name and other information attached. After all, what good is an infectious disease contact list without any contact information? These reports are in a protected section of the portal, and the ability to view personal information (such as name, insurance number, etc.) is controlled at the system level and linked to login credentials. This way, we can control who has access to personal information, and can audit what they access and when.
Beyond very specific and approved purposes, do I ever need to see information that identifies an individual patient? I think the answer is no.
When analysts are accessing and analyzing health information, I firmly believe that the default mode should be to always work with totally anonymized data. This should always be the case unless the task at hand clearly and legitimately requires information to not be anonymous. This is the equivalent of a "need to know" stipulation. Personally identifiable information should be excluded unless it is specifically required.
How to protect anonymity
As mentioned above, one solution for preventing accidental or unauthorized access to personal information is to partition all identifiable and sensitive information off in the database, granting access on a user-by-user basis. This is a common solution, but requires the careful setup of user access privileges by database administrators.
Another solution that has worked well in my group and others is the creation of an "analytics sandbox" that is a copy of the most commonly used data elements from our source systems but is completely anonymous and stripped of all identifiable information -- separate from the database that contains PHI. When we built the sandbox, we took care to ensure that we included only fields into which private information could not be entered; this meant we avoided most fields into which text could be freely entered. The analytics sandbox is the data set on which we can run complex statistical analyses, build and test predictive models, and test dashboards and other analytical tools without risk of breaching an individual's record.
One benefit of working exclusively with de-identified data is that it can reduce the risk of a breach of personal health information to almost zero, especially if the data is being transmitted between researchers or analysts.
Consider that one of the biggest risk factors for an information breach is the accidental disclosure of data caused by sending identifiable health information to an unauthorized individual. This could include sending paper records to the wrong address or emailing information to an unintended individual. If you've ever hit "reply all" instead of just "reply" on your email application, you'll know how easy it is to send sensitive information to the wrong people. In addition, if a laptop or memory device is lost or stolen, de-identified data cannot be used for nefarious purposes even if the additional protective measures (such as disk drive and file encryption) that were taken are breached.
All users of health information that I know, myself included, consider ourselves to be responsible and respectful stewards of the data we use. As such, we rarely consider ourselves at risk of an accidental or intentional breach. However, both accidental and criminal breaches of security and privacy do occur. As responsible professionals, it is incumbent upon us to regularly review our own security practices to ensure we are not unnecessarily accessing, using or viewing private health information, or unintentionally increasing the risk of a breach of PHI.
About the author:
Trevor Strome, M.S., PMP, leads the development of informatics and analytics tools that enable evidence-informed decision making by clinicians and healthcare leaders. His experience spans public, private and startup-phase organizations. A popular speaker, author and blogger, Strome is the founder of HealthcareAnalytics.info; his book, Healthcare Analytics for Quality and Performance Improvement, was recently published by John Wiley & Sons Inc.
Hurdles to implementing healthcare big data analytics
The future of the ACO model depends on analytics
Where to apply analytics in your healthcare organization
Dig Deeper on Electronic health records privacy compliance
HIPAA (Health Insurance Portability and Accountability Act)
electronic protected health information (ePHI)
protected health information (PHI) or personal health information
Cyber criminals compromising PHI, target FTP anonymous authentication servers