Seattle, WA – Researchers at the Institute for Systems Biology (ISB) have published groundbreaking insights into how artificial intelligence (AI) can be used to identify social determinants of health, such as housing instability, from electronic health records (EHRs). The study, conducted in collaboration with Providence, appears in the Journal of Medical Internet Research and highlights both the promise and pitfalls of using AI in sensitive healthcare applications.
The research team tested the effectiveness of large language models (LLMs), including GPT-4 and GPT-3.5, alongside named entity recognition models, regular expressions, and human reviewers. Their analysis spanned over 25,000 clinical notes from 795 pregnant women, focusing on detecting housing instability and differentiating between current and past challenges.
Key Findings
Superior Recall but Limited Precision
GPT-4 emerged as the most effective tool for identifying cases of housing instability, surpassing human reviewers in recall (the ability to find all relevant cases). However, humans outperformed AI in precision, better distinguishing when patients were not experiencing housing instability and providing more accurate supporting evidence.
“These results show that LLMs present a scalable, cost-effective solution for an initial search for patients who may benefit from outreach,” said Dr. Jennifer Hadlock, ISB Associate Professor and corresponding author of the study.
Contextual Misinterpretations and Safety Concerns
While GPT-4 reliably cited text directly from clinical notes without fabricating evidence, some AI-generated interpretations were incorrect. These errors, if not reviewed by a human, could lead to potentially misleading conclusions.
“When a healthcare professional decides whether and how to reach out to offer help, they take great care to consider patient safety. Our results illustrate that it would still be essential to have a human read the actual text in the chart, not just the LLM summary,” Hadlock emphasized.
Impact of De-Identification on Accuracy
The study also investigated how de-identification—replacing sensitive information in medical notes with fictitious alternatives—affected AI performance. Using an automated technique called “hide in plain sight,” researchers found that recall rates declined significantly in de-identified notes, likely due to critical context being altered.
“This highlights the need to refine de-identification methods to preserve privacy without losing important details about social determinants of health,” said lead author Dr. Alexandra Ralevski.
Implications for Healthcare
The findings underscore AI’s potential as a valuable tool for identifying social determinants of health, enabling more targeted and scalable interventions. However, the research also highlights the importance of human oversight to ensure safety and accuracy, particularly when dealing with sensitive issues like housing instability or intersecting risks such as domestic abuse.
The study also calls attention to the trade-offs between privacy and data accuracy, emphasizing the need for improved de-identification methods that maintain critical contextual information.
About the Study
The research was conducted within Providence’s secure internal environment, leveraging advanced LLMs based on generative pre-trained transformers (GPT). The study provides a robust framework for evaluating AI’s effectiveness in analyzing EHRs while addressing the ethical and practical challenges of real-world applications.
For more information, refer to the original study: Alexandra Ralevski et al., Using Large Language Models to Abstract Complex Social Determinants of Health From Original and Deidentified Medical Notes: Development and Validation Study, Journal of Medical Internet Research (2024). DOI: 10.2196/63445.