It can be argued that as the number of potential healthcare data items that can be analyzed in modern clinical systems continues to grow, analysts have less understanding of the data compared to when databases were much smaller and used for specific purposes.
As health IT becomes more sophisticated, having more data is indeed a good thing. But it's simply a fact of life that a healthcare data analyst cannot have as intimate knowledge as he or she once had of data systems. Indeed, I often have to scratch my head and look up the exact definition of a data item if I haven't used it in a while or if I need to explain to somebody exactly what the source data represents.
When new data becomes available in healthcare information -- or any other type of -- system, or when embarking on a new analytics development effort, it is important to fight the urge to dive right in without first obtaining a clear understanding of the data and how it relates to the business.
Below is a high-level summary of what is critical to know about data before exploring new data or developing analytical tools such as dashboards, reports, alert agents and any sort of reporting. Future articles will cover these in more detail.
- What the data represents. Much healthcare data is generated on the front lines during the provision of care by clinicians and other staff. It is important for analysts to know the processes and workflows from which the data is taken, what the data is measuring and who is responsible for entering the data. If possible, outcome data should be attached to process data to help determine how efficient, effective and safe clinical workflows are.
- Where and how the data is stored. Fundamental to using data is for the data analyst to know where the data is located. Is the data being stored in an enterprise data warehouse, a data mart aligned with a clinical system or a standalone database? Along with knowing where the data is stored, understand the quality of the data. For example, are there missing values that might bias analysis, or are there invalid entries that need to be cleaned/addressed?
- The data type. Most database management systems require data to be stored as certain types (such as integer, character, and date/time). Regardless of how data might be physically stored in a database, what kind of data do the values represent in "real life"? Are there any data conversions that need to be done before the data becomes useful for the intended purpose? For example, numbers stored in character fields may need to be cleaned and cast to a numeric type such as float or integer to undergo appropriate operations.
- What logically can be done with the data. Given the type of data and how it is stored, what kind of database and mathematical operations can be performed on the data in meaningful ways? While you can do counts for any data type, even basic operations, such as addition, and statistics, such as mean, would not be valid on categorical and ordinal types of data, even if the values appear numeric.
- How to turn data into useful information. Raw data in and of itself is rarely useful. Even in this age of big data, an organization's executive, management and other decision-makers can make more effective decisions if the data can be compiled, analyzed and used to generate insight into an organization's operations. It can also highlight what the best way forward is if the data results range from specific, well-defined performance indicators on a dashboard to simulation and predictive analytics.
Uses of big data in healthcare
Has big data analytics turned a corner in healthcare?
Strategy required to optimize big data
Quantified self movement adding to big data heap
It can be argued that, as big data is garnering attention from an infrastructure and technology perspective, focus must always be maintained on how all that data relates to the business and what is relevant to decision-makers. In my experience, when a healthcare data analyst begins with a new data set, it's best to spend time on the floor (or in the office) where the work occurs that generates the data, and where analytics' resultant insight is being used to guide decisions. This hands-on exposure helps relate data to actual situations and conditions that simply reviewing existing documentation and meta-data would never be able to elucidate.
About the author:
Trevor Strome, M.S., PMP, leads the development of informatics and analytics tools that enable evidence-informed decision making by clinicians and healthcare leaders. His experience spans public, private and startup-phase organizations. A popular speaker, author and blogger, Strome is the founder of HealthcareAnalytics.info, and his book, Healthcare Analytics for Quality and Performance Improvement, was recently published by John Wiley & Sons Inc.