
The healthcare industry has suffered from notorious data-quality issues for decades.
Poor, incomplete data can result in missed opportunities to improve patients’ health and improper treatments, while robust data governance practices build trust and empower the delivery of more precise, life-enhancing care.
In healthcare, there is significant room for improvement in data quality. For example, an Experian survey of healthcare professionals, including chief data officers, clinical data analysts, and health IT managers, rated their confidence in data quality at an average of 7.08 out of 10.
The survey identified three primary concerns associated with the effects of poor healthcare data quality:
- Duplicated administrative work due to data inconsistencies, which places a burden on healthcare staff;
- Incorrect patient details, which complicate outreach efforts and impact billing, patient satisfaction, and resource allocation;
- Missed appointments, which disrupt the continuity of care and reduce operational efficiency.
As healthcare moves into the age of artificial intelligence (AI), underlying issues with healthcare data quality are an increasing concern because AI systems are only as good as the data they rely on.
Challenges in healthcare data quality
Healthcare has long struggled with poor data quality, and the reasons behind these issues are numerous. Technical challenges, such as misconfigured systems that fail to populate data fields correctly, often introduce errors at the point of collection.
Human error is another common factor. For example, clinicians working under time pressure may select an imprecise diagnosis from a poorly designed dropdown menu, resulting in inaccurate records. These problems are compounded when data systems are not aligned or integrated, creating inconsistencies that ripple across the care continuum.
Defining what constitutes “clean data” is itself a challenge. Several dimensions must be considered: accuracy (Is the information correct?), completeness (Is the record fully captured?), usability (Is it presented in a way that makes sense?), and validity (Does the coded information logically apply to the patient?).
Errors across these dimensions are not hypothetical; they are observable in real-world records, where men are sometimes documented as having hysterectomies or women recorded as having prostate exams. These coding errors are not just technical oversights. They undermine trust in the healthcare system and pose risks to patient safety.
Further complicating the issue, organizations often adopt different definitions and standards for data quality. One hospital’s “clean” dataset may not align with another’s. As data is exchanged between organizations, these discrepancies exacerbate quality problems, creating a fragmented ecosystem where expectations are mismatched and interoperability is limited.
Without shared standards and coordinated approaches, efforts to address data quality remain siloed and inconsistent.
The need for clean data in healthcare
In healthcare, the importance of clean, fit-for-use data cannot be overstated. Data that is presented in the wrong format, incomplete, or inconsistent can render it unusable for advanced applications like AI.
When systems expect information in a certain format and receive something else, the outcome is flawed analysis. This is why data must be evaluated not only for accuracy, but also for its fitness for use.
The classic adage “garbage in, garbage out” applies directly here, but in healthcare the stakes are far higher. Clean data produces clearer signals for AI models, much like a crisp MRI image provides a radiologist with better visibility into a patient’s condition.
Trust in AI is still developing among healthcare professionals. If clinicians perceive that AI systems are operating on flawed data, their confidence in these tools will erode. Or worse, as reliance on AI grows, there is a risk that clinicians may reduce the level of scrutiny they apply to machine-generated insights.
AI and data quality considerations
Healthcare is uniquely vulnerable to the effects of poor data quality because of the complexity of human health. The sheer number of medical specialties underscores the difficulty of capturing a complete and accurate health profile.
From cardiology to endocrinology to orthopedics, each specialty generates its own streams of data, all of which must be harmonized to provide a comprehensive view of the patient.
When AI models are trained on unclean data, they may create significant risks for patients. Erroneous or ambiguous data can push AI into states where it cannot reliably diagnose or recommend treatments.
Similarly, biased data, while distinct from poor data quality, can lead to incomplete or skewed inferences, often disadvantaging underrepresented populations. Since AI can only generate insights from the data it has been fed, any flaws in that data will cascade through its outputs.
The future of data quality in healthcare
Looking ahead, the future of data quality in healthcare will likely be shaped by the development of standardized measurement frameworks. Today, there is no universally accepted definition of what constitutes “clean data,” which makes progress fragmented and inconsistent. The first step toward improvement will be establishing agreed-upon measures.
It is reasonable to expect that regulation will play a role in advancing this agenda. Policymakers may mandate adherence to specific data standards or establish rubrics for scoring data quality. Once these standards are in place, organizations will be able to benchmark their data, compare results, and set targets for improvement.
While legislation alone will not solve all the challenges, it will provide the clarity and consistency the industry currently lacks. A shared framework for measuring data quality would not only improve data exchange between organizations but also help build trust in the insights derived from that data. Ultimately, this will pave the way for better patient outcomes and more responsible use of AI in healthcare.
About Derek Plansky
Derek Plansky, senior vice president of strategic governance at Health Gorilla, an interoperability company and QHIN

