Clinical Natural Language Understanding (cNLU), the technology by which computers extract meaning from clinical text, is quickly becoming a common feature in the healthcare IT landscape. In 2021, 30% of surveyed healthcare organizations were either using or exploring the technology. (Gradient Flow, 2021). Similar adoption is occurring in the UK as well. (Wu et. al, Nature, 2022). Today, cNLU is being applied to billing/coding, trial enrollment, registry creation, clinical decision support, prior authorization, fraud/abuse detections, and other labor-intensive workflows. In this blog we focus on the cNLU task of extracting and coding concepts from clinical notes, thereby converting unstructured data – the state of the vast majority of clinical data in electronic medical records (EMRs) – into structured data. This list will expand over time, as more clinical innovators become familiar with the technology, and the cNLU tools become better, cheaper, and more accessible. However, for many uses, there is one thing holding the technology back-it generates too much information and, relatedly, too much noise.
First, we should celebrate that we are now at a point where advances in machine learning are enabling sufficiently performant models that understand the meaning of the text in a manner that approximates professional reviewers. The task of concept extraction and coding is typically evaluated by metrics. For example, for every instance of depression in a corpus of clinical documents, how many instances were correctly captured by the model? For every instance, the model captured a span of text and labeled it depression, how frequently were those actually cases of depression? But extracting even a seemingly simple concept like “depression” is challenging. A reference to depression might be “patient exhibited signs of significant depressive disorder.” It could also be “patient’s blood pressure was depressed.” A model that differentiates between these cases, and that determines only the former case represents clinical depression, is understanding the nuance of language.
The problem is still more challenging insofar as references to the medical condition of depression can also occur, and in fact largely do occur, in the context of family histories, as a condition a patient does not/hypothetically could/possibly does suffer from, or as part of a screening. Consider the examples “reports father suffered from significant depression,” “patient may suffer from depression,” and “chronic disruptions to sleep patterns may result in depression.” In all of these cases, depression is not conclusively present for the patient. As challenging as these cases are, and while not yet as reliable as clinicians, cNLU has already outperformed trained reviewers in extracting insights from charts, including even in high-risk / high-acuity clinics (Suh et. al., Anest & Analgesia, 2022).
A model that reliably extracts and contextualizes all the salient concepts in a corpus of clinical documents and stops there improves the status quo. However, it does not meet the needs of a user that is inevitably interested in finding very specific pieces of information about a patient, or patient panel. For example, someone who is trying to find evidence of patients who likely have type 2 diabetes but does not have it on their problem lists will want to know which patients have abnormal fasting glucose or hemoglobin A1C results. But a patient’s chart will feature references to thousands of problems, tests, and treatments and might feature a single reference to blood sugar tests, or else a range of results for these tests that generally wouldn’t indicate a need for review. A recent analysis we conducted found that on average a medical chart generates roughly 12,500 data elements. Finding an abnormal result for a specific test is a needle in a haystack problem for someone manually reviewing a chart. But for the vast majority of people who don’t write code to work with data, it’s the same problem for someone working with structured data from the chart.
For software to meet the needs of someone who reviews clinical charts, data needs to be searchable. This is the case even if organizations run limited purpose models that narrowly find references to a single topic such as diabetes, though we believe users will always have additional questions about their data. Before a user ever makes a search, all of the instances of hemoglobin A1C – as a1c, HbA1c, hgba1c, et al. – need to be coded the same way, so that when the user searches for hemoglobin A1C, they return the results for any occurrence of that test. Lab values should also be extracted from the notes and associated with tests. Users might want to see any instance of a hemoglobin A1C test for an individual patient or a cohort of patients. They might want to see any instance of the test where the test value is >= 6.5%. They might add a further complication, looking for the same results as above, but – assuming the findings from clinical notes have been aggregated with the patient’s already structured data – filtering out the patients who already have type 2 diabetes on their problem lists. And they might add a date parameter, so as only to return results a reviewer would not already have seen.
Whether one condition or many, search is the tool that enables a user to find the information they care about. In our view, the blend of the capability to make data searchable, in combination with the clinician’s validation results, is how to maximize cNLU for chart review.
Leveraging cNLU to rapidly structure clinical notes and make the entire medical record searchable holds the potential to free up massive labor pools for more valued tasks. Users of cNLU for chart review already report significant reductions in the time spent reviewing charts for coding improvement, visit preparation, prior authorization review, and chart audit. Rather than relying exclusively on high-cost, hard-to-hire, potentially burnt-out clinical labor to read charts manually, cNLU, by making charts searchable, frees up more of the clinician’s time. To validate results, attend to the most difficult cases, and become more informed about their patients.
About Kevin Agatstein
Kevin Agatstein is the founder and CEO of KAID Health, an AI-powered healthcare data analysis and provider engagement platform. Prior to KAID, Kevin founded Agate Consulting and held roles at McKinsey & Company and Arthur Andersen where he advised providers, payers, healthcare IT companies, life-sciences organizations, and healthcare venture-capital and private-equity firms. Kevin also led operations for CareKEY, Inc., from its early years through its acquisition by The TriZetto Group.
About Dimitr Lindei
Dimitri Linde is a Clinical AI Specialist at KAID Health, focused on clinical natural language processing. He developed KAID Health’s pipeline to extract and encode information from clinical notes.