Editor’s Note: Dr. Khaled El Emam is the CEO of Privacy Analytics, a world-renowned expert in statistical de-identification and re-identification risk. He is one of only a handful of known individual experts in North America qualified to certify the anonymization of Protected Health Information under the HIPAA privacy law.

Between October 2009 and August 2015, 1,286 HIPAA-covered entities and business associates have reported data breaches involving more than 500 records, which have affected more than 153 million people. The estimated cost to the organization of all these breaches is more than 31 billion dollars, or $208 per person. The costs associated with a breach include legal and settlement fees, crisis and reputation management services, and other financial costs such as notifications and credit monitoring services. As digital health applications continue to produce an increasing volume of consumer and patient information and as health data becomes more valuable in the underground market, there will be an increasing need to use proven, best practices to protect this information. Particularly as the demand for health data sharing continues to grow.
Standards for protecting sensitive information significantly reduce the risk of re-identifying individuals when the information is shared for secondary purposes. Some of the top reasons for sharing health data include academic research, improving health outcomes and patient safety, identifying cost savings and efficiencies, testing, and fraud detection.
The Health Data Exploration Project found that more than half (57%) of people who track their own personal health data stated an “assurance of privacy” was needed to make health data available for research and over 90 percent stated that keeping the data anonymous was important. As far back as 2009, Pricewaterhouse Coopers found that 90 percent of pharmaceutical and life sciences companies, payers and providers agreed that the healthcare industry needs better guidelines for using and sharing secondary data. And here we are in 2015, still struggling to define and standardize these guidelines.
The good news is that concerns about privacy and risk can be easily addressed by using responsible methodologies for de-identification. De-identification is used to assess and manage risks associated with the use of data for secondary purposes. Risk based de-identification methods are used to produce data with the greatest value for research and analytics while ensuring for a very small risk of re-identification. The goal of these methods is to remove details from the data that could be used to identify an individual person while keeping the information that is most valuable for healthcare research and analysis. Once HIPAA-protected health information (PHI) has been de-identified, it is no longer considered PHI and can then be shared for important research, analysis and other secondary uses.
While there is currently no one standard method for the de-identification of PHI, efforts to create a framework are underway. The Health Information Trust Alliance (HITRUST) recently released a de-identification framework, an information security framework, which organizations can use when creating, accessing, storing or exchanging personal information. This framework has collated and refined current standards and regulations so that health organizations have access to essential information regarding information security. Data-sharing organizations that use a HIPAA-complaint, proven de-identification methodology recommended by IOM, PhUSE, HITRUST and Privacy by Design have had zero people re-identified in their data. This means that these organizations have not experienced any data breaches caused by improperly or insufficiently de-identified data.
Organizations using this methodology have accomplished great advances in medical research for HIV, cancer, diabetes and many other diseases, made improvements in clinical care, and accelerated the process for clinical trials without a single data breach related to the re-identification of shared de-identified data.
This is where the HITRUST framework for de-identification methodology standards can play an important role. High profile data breaches, concerns about risks to patient privacy, and lack of clear standards have had a chilling effect on secondary uses of data and are impeding progress. Using this proven methodology, a growing number of organizations are demonstrating that it is possible to de-identify healthcare data for more and better research and analysis while protecting patient privacy in a way that minimizes the risk of a breach associated with the data set.
The HITRUST framework and the release of more standards in the de-identification space are important initiatives that will protect patient information and thereby help reduce the high cost of data breaches while simultaneously making more data available to solve some of healthcare’s most challenging problems.