In the age of big data and enterprise data warehouses, issues of data volume, system scalability and infrastructure management simply cannot be ignored. It is possible, though, that in our rush to incorporate ever larger data sets from new sources, we sometimes lose sight of the more mundane but critical need to ensure that the data we have access to is high quality through health data governance programs and policies.
One of the major barriers I regularly encounter to analytics use and widespread adoption is poor data quality. My team makes an enormous effort to rectify this problem. Because of the importance of data quality in developing and using analytics for decision making, I dedicated an entire chapter in my book Healthcare Analytics for Quality and Performance Improvement to the topic. Having high-quality data is foundational to healthcare analytics, and the output of analytics (in the form of dashboards, reports, simulations and predictive models) is only as accurate as the data on which it is based.
Large data sets are beneficial for healthcare analytics, but quantity isn't the goal. High-quality data is an essential ingredient to accurate, valid and trusted analytics that are used by healthcare leaders. Having good data cannot alone ensure that analytics built and utilized by a healthcare organization will result in the desired transformations in quality, performance and patient safety. Bad data, however, will almost guarantee that efforts to use information will be scuttled due a lack of trust or belief in the analytics results.
What is good data?
News and tips on analytics in healthcare
Opportunities and obstacles to successful analytics
Avnet health analytics platform includes training and education
Data management an obstacle for healthcare analytics
Good data and bad data are terms that are often used, but have no clear meaning. Quality consultant Joseph Juran put it well when he said, "Data are of high quality if they are fit for their intended uses in operations, decision making and planning."
Getting data to the point that it is fit for any use can be an uphill battle. Data quality considerations can range from checking the source systems (including the user-friendliness of the software and how data is, or isn't, validated upon entry) to ensuring that business rules and other filters applied during data transformation and loads are valid.
Common root causes of poor data quality
As data quality becomes even more integral to the use of information and the generation of analytics insight, the root causes of data errors (and other data quality issues) have been studied extensively and systematically by many experts.
In Journey to Data Quality, Yang Lee and her co-authors identify key studies and concepts on the topic of data quality. One chapter of the book enumerates 10 root causes of data quality problems. Here are six of those root causes of data quality issues that are related to healthcare analytics:
- Data from multiple source systems. Multiple source systems are very common in healthcare, including the electronic patient record and systems for registration, labs and diagnostic imaging. Each of these systems stores a subset of every patient's data. Complicating the matter is that each source system may have its own data validation rules, formats and key identifiers.
- Subjective judgment in data production. Analytics requires understanding the business context of data. Without clear documentation and/or definitions of the data, personal interpretations of what the data means can impact what is recorded. For example, users of the source system can have different interpretations of how the system is to be used and how to record the data that comes out of the system. These variations in interpretation must be eliminated.
- Trade-off of security versus accessibility. Security and accessibility are invariably at odds. If security is too tight on databases that are required for analysis, developers may go elsewhere for data that is more accessible. This means, however, that important data that may contribute to a business or clinical problem is not utilized.
- Multiple data coding schemes. Another issue common in healthcare involves different source systems that may encode data using different coding schemes (such as ICD-9 versus ICD-10), which may hinder the comparability and compatibility of information.
- Complex data representations. Healthcare consists of a series of very complex workflows. The data that represents these complex clinical or business processes (or even simple processes that are stored iteratively) may induce errors during the extraction and analysis phases if the data cannot be made to accurately record and/or reflect those processes.
- Volume of data. Although most healthcare organizations have scalable infrastructures to support large volumes of data on production systems, the computing power allocated to analytics is often only a fraction of that. Large volumes of data can present a challenge for many analytics tools. For example, some desktop statistical packages can only analyze data that is stored in memory (as opposed to on a disk), which can limit the size of data that can be analyzed.
Analytics teams need to work together with data warehouse managers and front-line staff to ensure that all possible sources of poor data quality are reduced or eliminated. As clinical systems and the data warehouses on which information is stored become more complex, data quality must become a shared responsibility among all data owners within a healthcare organization. Healthcare organizations need to work more diligently than ever before to ensure the availability and trustworthiness of the data and information that decision makers require.
About the author:
Trevor Strome, M.S., PMP, leads the development of informatics and analytics tools that enable evidence-informed decision making by clinicians and healthcare leaders. His experience spans public, private and startup-phase organizations. A popular speaker, author and blogger, Strome is the founder of HealthcareAnalytics.info, and his book, Healthcare Analytics for Quality and Performance Improvement, was recently published by John Wiley & Sons Inc.