Why making data “worthless” is useful: Best practices for ensuring the privacy and security of health care data
Posted by: Jenny Laurello
Data management, data masking, Data privacy, Data security, HIPAA, HITECH Act
Guest post by Jan Rosenberg, Director, ILM, Informatica
It’s safe to say that it’s a challenging time for health care IT professionals who are leading the data management and HIPAA compliance charges at their organizations.
SQL injection attacks jumped 134% in 2008, increasing from an average of a few thousand a day to several hundred thousand per day, according to a recently published report. Forrester Research reports that 60% of enterprises are lagging in applying database security patches, while 74% of all Web application vulnerabilities (predominantly SQL injection) did not even have an available patch by the end of 2008. Most security efforts have previously focused on securing network perimeters, client systems (firewalls, IDS/IPS, anti-virus, etc.), and application access controls. However, corporate database security breaches comprise the vast majority of privacy breaches today. Not only are breaches on the rise, the notification requirements and fines associated with the HITECH Act are creating much deeper and more visible impacts.
As health care organizations are tightening the security around their production application data, their test systems are still largely unprotected and full of exposed PHI and other sensitive data. Unfortunately, access to non-production systems is not as tightly controlled as production. This creates a conundrum for the security group and the development and testing teams working under tight timelines. How can they provide a secure database that will not expose sensitive data, yet meet the requirements of these groups? In the past, dev/test databases have simply been complete production copies to provide these groups with the appropriate data to meet their needs. This practice has been common for years. With health care organizations currently transforming their application systems, the requirements for testing new and upgraded systems is exploding, as is the risk of a privacy breach.
Testing is an essential part of the successful rollout of a new system, and it typically stands in the way of a final sign-off or a “go-live”. For this reason, test systems can become complex in order to provide test beds to satisfy the requirements of every functional group and potential production scenario. The dynamic nature of test systems and its stakeholders often creates security loopholes and causes one to “spawn” production data multiple times. For this reason, data in test systems must be examined specifically.
There are several technologies to consider when securing an application database. Encryption is typically used in production environments. The encrypted data can be decrypted using a decryption key. This key requires yet another level of security for protection. In addition, encryption only secures the data as it lives in the database. The data is displayed on the monitor when viewed by the user via the application in its original, decrypted, unsecure form. Thus, encryption itself is not sufficient to secure data in test/dev environments since QA and developers will likely have complete access to all data. Another possible solution is that the sensitive data can be changed using internally-built scripts. The overhead to such a non-automated process can be overwhelming. In addition, you need to locate and identify the sensitive data to ensure completeness of replacement. This discovery process has to be very accurate – which is difficult to do manually. In most cases, there is a requirement for contextually accurate data by the development and/or test teams. As a result, merely replacing the data does not meet this requirement since there is a need for some complex intelligence built into the data transformation process (testers may need valid credit card numbers, or reasonably valid names and addresses). More recent technologies provide the capability of helping to discover the sensitive data in application databases, as well as transforming data for test/dev instances.
You must use a few core techniques when building data masking rules intended to make test data useless from the start.
- Ensure contextual accuracy for sensitive fields such as names and addresses, social security number/national id, credit card numbers, etc.
- Leverage built-in masking routines for common PHI data. This will save time and create reusable policies (ex. Email, SSN, phone, IP addresses, credit card).
- Must be able to create your own custom masking templates to meet specific requirements and align with internal privacy policies.
- Propagate masking policies for a given sensitive field across all databases
- Ensure there is no traceability back to the original sensitive data.
Data masking is an extremely effective method of securing data from project inception to production “go-live” without compromising the quality or requirements of the development and testing process.
In summary, data profiling/masking provides a targeted, secure, comprehensive approach to automate the discovery and de-identification of sensitive data in non-production environments. With an opportunity to deploy across the enterprise, you can comply with privacy regulations and drastically reduce the risk of a HITECH Act privacy breach. Data masking is being used by many health care organizations for this specific purpose.