– Geisinger/IBM’s AI and Data Science team are working together to prevent sepsis, life-threating infection through early detection and research.
– Geisinger team used IBM Watson Studio open-source tools to build a predictive model that would ingest clinical data from thousands of de-identified sepsis patients spanning a decade.
– Geisinger uses IBM Watson Explorer to create a searchable index of thousands of medical publications for clinicians and researchers can now mine the journal archive to uncover the most relevant content within a few clicks.
Geisinger and IBM Data Science have teamed up to use machine learning tools to identify characteristics associated with a higher risk of Sepsis mortality, helping clinicians to develop more effective treatments. Led by Dr. Shravan Kethireddy, a six-person Geisinger team are developing a predictive model for sepsis mortality based on data from their Epic electronic health record (EHR) system, helping clinicians to develop more effective treatments.
The Bigger Picture
Sepsis kills nearly 270,000 patients in the US every year. One sepsis patient dies every two minutes – and 80 percent of these deaths are preventable by early detection. Sepsis is also the most expensive condition to treat in the U.S., costing the nation more than $24 billion in hospital expenses annually.
“For clinicians, making a sepsis diagnosis can be very difficult, as the symptoms overlap with many other common illnesses. If we can identify patients more quickly and more accurately, we can administer the right treatments early and increase the chances of a positive outcome,” explains Dr. Donna Wolk, Division Director, Molecular and Microbial Diagnostics and Development at Geisinger.
Around two million new medical journal papers emerge every year, making this a very difficult task for busy staff. “Our teams need to be on top of all recent and archived studies, as each publication could reveal valuable findings that help us make the next breakthrough. Searching through back issues to locate specific journal articles can be a time-consuming process, so we also looked to find a way to keep researchers abreast of key findings,” Dr. Wolk adds.
Geisinger: Fighting Sepsis with Machine Learning
The team used IBM Watson Studio open-source tools to build a predictive model that would ingest clinical data from thousands of de-identified sepsis patients spanning a decade. Then they used all the data to build another model to predict patient mortality during the hospitalization period or during the 90 days following their hospital stay. The predictive model helped researchers identify clinical biomarkers associated with higher rates of mortality from sepsis by predicting death or survival of patients in the test data.
For the first use case, Geisinger provided de-identified files for 10,599 patients diagnosed with sepsis between 2006 and 2016, either before hospitalization or during their stay. The Geisinger and IBM teams broke the data into 199 separate features for each patient, covering details such as their age, infection type, surgery and treatments, medical history and lifestyle.
Next, the teams set themselves the goal of using the data to predict patient all-cause mortality during the hospitalization period or during the 90 days post-discharge. The data scientists used open-source XGBoost library and used the Python programming language in the Watson Studio solution to develop a scalable machine learning algorithm based on gradient-boosted decision trees to analyze the data.
After splitting the data 60/40 between training and testing, the team fine-tuned the predictive model before using the final version to estimate precision, recall and the Area Under the Receiver Operating Characteristic (ROC) Curve (AUC), illustrating which clinical features could be used to indicate patient mortality.
Outcomes/Results
Of the 10,599 patients in the sample, 25.2 percent died during hospitalization, and a further 13.1 percent in the 90-day post-discharge period. The predictive model provided impressive results, identifying 1,190 True Positives and 2,087 True Negatives: that is, correctly predicting death for patients in the test data who did die, or survival for patients who were successfully treated.
“Our experience using machine learning and data science has been very positive, and we see huge potential to continue its use in the medical field,” said Dr. Vida Abedi, Geisinger staff scientist, Department of Molecular and Functional Genomics. “We are well on our way to breaking new ground in clinical care for sepsis and achieving more positive outcomes for our patients.”
Armed with the new model, Geisinger can develop more personalized clinical care plans for at-risk sepsis patients. Geisinger hopes to increase patient chances of recovery by paying attention to the key factors linked to sepsis deaths.
Using Watson Explorer for Geisinger to Gather the Latest Sepsis Research
Around two million new medical journal papers emerge every year. Geisinger was looking for way to help keep their researchers apprised of all recent and archived studies. For this part of the project, the IBM Data Science and AI Elite team used IBM Watson Explorer to create a searchable index of thousands of medical publications. Clinicians and researchers can now mine the journal archive to uncover the most relevant content within a few clicks.
“Predicting all-cause death in sepsis patients can guide health providers actively monitor and take preventive actions to improve patients’ survival,” said Richard Balduino, solution architect with the IBM Data Science and AI Elite team. “Many of the features that were identified as important in our model are known to be associated with sepsis patients’ death. This provides reassurance that our machine learning models can help identify well-known associations with sepsis death even among the noise of many unrelated variables.”
The teams then added natural language taxonomies, including the 2017 Medical Subject Headings (MeSH) created by the US National Library of Medicine, and the DrugBank database. Users of the index can run queries based on the words listed in these hierarchies of key medical and pharmacological terms, uncover relations between concepts and specify a time period for journal publications in which the terms were used.
Dr. Wolk adds: “IBM Watson Explorer has been really impressive so far. Not only could we use the solution to ingest a huge amount of data, but being able to fine-tune search terms and run queries through the intuitive interface will make things much easier for our researchers.”