Question-Answering System Extracts Information on Injection Drug Use
from Clinical Notes
- URL: http://arxiv.org/abs/2305.08777v2
- Date: Thu, 28 Dec 2023 16:24:30 GMT
- Title: Question-Answering System Extracts Information on Injection Drug Use
from Clinical Notes
- Authors: Maria Mahbub, Ian Goethert, Ioana Danciu, Kathryn Knight, Sudarshan
Srinivasan, Suzanne Tamang, Karine Rozenberg-Ben-Dror, Hugo Solares, Susana
Martins, Jodie Trafton, Edmon Begoli, Gregory Peterson
- Abstract summary: Injection drug use (IDU) is a dangerous health behavior that increases mortality and morbidity.
The only place IDU information can be indicated is unstructured free-text clinical notes.
We design and demonstrate a question-answering (QA) framework to extract information on IDU from clinical notes.
- Score: 4.537953996010351
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Background: Injection drug use (IDU) is a dangerous health behavior that
increases mortality and morbidity. Identifying IDU early and initiating harm
reduction interventions can benefit individuals at risk. However, extracting
IDU behaviors from patients' electronic health records (EHR) is difficult
because there is no International Classification of Disease (ICD) code and the
only place IDU information can be indicated is unstructured free-text clinical
notes. Although natural language processing can efficiently extract this
information from unstructured data, there are no validated tools. Methods: To
address this gap in clinical information, we design and demonstrate a
question-answering (QA) framework to extract information on IDU from clinical
notes. Our framework involves two main steps: (1) generating a gold-standard QA
dataset and (2) developing and testing the QA model. We utilize 2323 clinical
notes of 1145 patients sourced from the VA Corporate Data Warehouse to
construct the gold-standard dataset for developing and evaluating the QA model.
We also demonstrate the QA model's ability to extract IDU-related information
on temporally out-of-distribution data. Results: Here we show that for a strict
match between gold-standard and predicted answers, the QA model achieves 51.65%
F1 score. For a relaxed match between the gold-standard and predicted answers,
the QA model obtains 78.03% F1 score, along with 85.38% Precision and 79.02%
Recall scores. Moreover, the QA model demonstrates consistent performance when
subjected to temporally out-of-distribution data. Conclusions: Our study
introduces a QA framework designed to extract IDU information from clinical
notes, aiming to enhance the accurate and efficient detection of people who
inject drugs, extract relevant information, and ultimately facilitate informed
patient care.
Related papers
- AIPatient: Simulating Patients with EHRs and LLM Powered Agentic Workflow [33.8495939261319]
We develop an advanced simulated patient system with AIPatient Knowledge Graph (AIPatient KG) as the input and Reasoning Retrieval-Augmented Generation (Reasoning RAG) as the generation backbone.
Reasoning RAG leverages six LLM powered agents spanning tasks including retrieval, KG query generation, abstraction, checker, rewrite, and summarization.
Our system also presents high readability (median Flesch Reading Ease 77.23; median Flesch Kincaid Grade 5.6), robustness (ANOVA F-value 0.6126, p>0.1), and stability (ANOVA F-value 0.782, p>0.1)
arXiv Detail & Related papers (2024-09-27T17:17:15Z) - K-QA: A Real-World Medical Q&A Benchmark [12.636564634626422]
We construct K-QA, a dataset containing 1,212 patient questions originating from real-world conversations held on K Health.
We employ a panel of in-house physicians to answer and manually decompose a subset of K-QA into self-contained statements.
We evaluate several state-of-the-art models, as well as the effect of in-context learning and medically-oriented augmented retrieval schemes.
arXiv Detail & Related papers (2024-01-25T20:11:04Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Prediction of drug effectiveness in rheumatoid arthritis patients based
on machine learning algorithms [2.5759046095742453]
Rheumatoid arthritis (RA) is an autoimmune condition caused when patients' immune system mistakenly targets their own tissue.
Machine learning (ML) has the potential to identify patterns in patient electronic health records to forecast the best clinical treatment to improve patient outcomes.
This study introduced a Drug Response Prediction (TNF) framework with two main goals: 1) design a data processing pipeline to extract information from clinical data, and then preprocess it for functional use, and 2) predict RA patient's responses to drugs and evaluate classification models' performance.
arXiv Detail & Related papers (2022-10-14T15:15:37Z) - Literature-Augmented Clinical Outcome Prediction [10.46990394710927]
We introduce techniques to help bridge this gap between EBM and AI-based clinical models.
We propose a novel system that automatically retrieves patient-specific literature based on intensive care (ICU) patient information.
Our model is able to substantially boost predictive accuracy on three challenging tasks in comparison to strong recent baselines.
arXiv Detail & Related papers (2021-11-16T11:19:02Z) - A causal learning framework for the analysis and interpretation of
COVID-19 clinical data [7.256237785391623]
The workflow consists in a multi-step approach that goes from identifying the main causes of patient's outcome through BSL.
We evaluate our approach on a feature-rich COVID-19 dataset, showing that the proposed framework provides a schematic overview of the multi-factorial processes that jointly contribute to the outcome.
Our approach yields to a highly interpretable tool correctly predicting the outcome of 85% of subjects based exclusively on 3 features.
arXiv Detail & Related papers (2021-05-14T15:58:18Z) - Variational Knowledge Distillation for Disease Classification in Chest
X-Rays [102.04931207504173]
We propose itvariational knowledge distillation (VKD), which is a new probabilistic inference framework for disease classification based on X-rays.
We demonstrate the effectiveness of our method on three public benchmark datasets with paired X-ray images and EHRs.
arXiv Detail & Related papers (2021-03-19T14:13:56Z) - Clinical Outcome Prediction from Admission Notes using Self-Supervised
Knowledge Integration [55.88616573143478]
Outcome prediction from clinical text can prevent doctors from overlooking possible risks.
Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-stay prediction are four common outcome prediction targets.
We propose clinical outcome pre-training to integrate knowledge about patient outcomes from multiple public sources.
arXiv Detail & Related papers (2021-02-08T10:26:44Z) - Classification supporting COVID-19 diagnostics based on patient survey
data [82.41449972618423]
logistic regression and XGBoost classifiers, that allow for effective screening of patients for COVID-19 were generated.
The obtained classification models provided the basis for the DECODE service (decode.polsl.pl), which can serve as support in screening patients with COVID-19 disease.
This data set consists of more than 3,000 examples is based on questionnaires collected at a hospital in Poland.
arXiv Detail & Related papers (2020-11-24T17:44:01Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z) - Hemogram Data as a Tool for Decision-making in COVID-19 Management:
Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure.
This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients.
Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.