Related papers: Large Language Models for Patient Comments Multi-Label Classification

Large Language Models for Patient Comments Multi-Label Classification

URL: http://arxiv.org/abs/2410.23528v3
Date: Thu, 20 Feb 2025 01:42:01 GMT
Title: Large Language Models for Patient Comments Multi-Label Classification
Authors: Hajar Sakai, Sarah S. Lam, Mohammadsadegh Mikaeili, Joshua Bosire, Franziska Jovin,
Abstract summary: This research explores leveraging Large Language Models (LLMs) in conducting Multi-label Text Classification (MLTC) of inpatient comments.<n> GPT-4 Turbo was leveraged to conduct the classification.<n>Using the prompt engineering framework, zero-shot learning, in-context learning, and chain-of-thought prompting were experimented with.
Score: 3.670008893193884
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Patient experience and care quality are crucial for a hospital's sustainability and reputation. The analysis of patient feedback offers valuable insight into patient satisfaction and outcomes. However, the unstructured nature of these comments poses challenges for traditional machine learning methods following a supervised learning paradigm. This is due to the unavailability of labeled data and the nuances these texts encompass. This research explores leveraging Large Language Models (LLMs) in conducting Multi-label Text Classification (MLTC) of inpatient comments shared after a stay in the hospital. GPT-4 Turbo was leveraged to conduct the classification. However, given the sensitive nature of patients' comments, a security layer is introduced before feeding the data to the LLM through a Protected Health Information (PHI) detection framework, which ensures patients' de-identification. Additionally, using the prompt engineering framework, zero-shot learning, in-context learning, and chain-of-thought prompting were experimented with. Results demonstrate that GPT-4 Turbo, whether following a zero-shot or few-shot setting, outperforms traditional methods and Pre-trained Language Models (PLMs) and achieves the highest overall performance with an F1-score of 76.12% and a weighted F1-score of 73.61% followed closely by the few-shot learning results. Subsequently, the results' association with other patient experience structured variables (e.g., rating) was conducted. The study enhances MLTC through the application of LLMs, offering healthcare practitioners an efficient method to gain deeper insights into patient feedback and deliver prompt, appropriate responses.

Related papers

ICU-TSB: A Benchmark for Temporal Patient Representation Learning for Unsupervised Stratification into Patient Cohorts [0.055923945039144905]
We introduce ICU-TSB (Temporal Stratification Benchmark), the first benchmark for evaluating patient stratification based on temporal patient representation learning.<n>A key contribution of our benchmark is a novel hierarchical evaluation framework utilizing disease to measure the alignment of discovered clusters with clinically validated disease groupings.<n>Our results demonstrate that temporal representation learning can rediscover clinically meaningful patient cohorts.
arXiv Detail & Related papers (2025-06-06T15:52:50Z)
Optimizing Large Language Models for Detecting Symptoms of Comorbid Depression or Anxiety in Chronic Diseases: Insights from Patient Messages [4.419296403133379]
Patients with diabetes are at increased risk of comorbid depression or anxiety, complicating their management. This study evaluated the performance of large language models (LLMs) in detecting these symptoms from secure patient messages.
arXiv Detail & Related papers (2025-03-14T13:27:35Z)
Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases [48.87360916431396]
We introduce MedR-Bench, a benchmarking dataset of 1,453 structured patient cases, annotated with reasoning references. We propose a framework encompassing three critical examination recommendation, diagnostic decision-making, and treatment planning, simulating the entire patient care journey. Using this benchmark, we evaluate five state-of-the-art reasoning LLMs, including DeepSeek-R1, OpenAI-o3-mini, and Gemini-2.0-Flash Thinking, etc.
arXiv Detail & Related papers (2025-03-06T18:35:39Z)
A Comparative Study of Recent Large Language Models on Generating Hospital Discharge Summaries for Lung Cancer Patients [19.777109737517996]
This research aims to explore how large language models (LLMs) can alleviate the burden of manual summarization. This study evaluates the performance of multiple LLMs, including GPT-3.5, GPT-4, GPT-4o, and LLaMA 3 8b, in generating discharge summaries.
arXiv Detail & Related papers (2024-11-06T10:02:50Z)
Evaluating the Effectiveness of the Foundational Models for Q&A Classification in Mental Health care [0.18416014644193068]
Pre-trained Language Models (PLMs) have the potential to transform mental health support. This study evaluates the effectiveness of PLMs for classification of Questions and Answers in the domain of mental health care.
arXiv Detail & Related papers (2024-06-23T00:11:07Z)
Evaluation of General Large Language Models in Contextually Assessing Semantic Concepts Extracted from Adult Critical Care Electronic Health Record Notes [17.648021186810663]
The purpose of this study was to evaluate the performance of Large Language Models (LLMs) in understanding and processing real-world clinical notes. The GPT family models have demonstrated considerable efficiency, evidenced by their cost-effectiveness and time-saving capabilities.
arXiv Detail & Related papers (2024-01-24T16:52:37Z)
Multimodal Pretraining of Medical Time Series and Notes [45.89025874396911]
Deep learning models show promise in extracting meaningful patterns, but they require extensive labeled data. We propose a novel approach employing self-supervised pretraining, focusing on the alignment of clinical measurements and notes. In downstream tasks, including in-hospital mortality prediction and phenotyping, our model outperforms baselines in settings where only a fraction of the data is labeled.
arXiv Detail & Related papers (2023-12-11T21:53:40Z)
From Classification to Clinical Insights: Towards Analyzing and Reasoning About Mobile and Behavioral Health Data With Large Language Models [21.427976533706737]
We take a novel approach that leverages large language models to synthesize clinically useful insights from multi-sensor data. We develop chain of thought prompting methods that use LLMs to generate reasoning about how trends in data relate to conditions like depression and anxiety. We find models like GPT-4 correctly reference numerical data 75% of the time, and clinician participants express strong interest in using this approach to interpret self-tracking data.
arXiv Detail & Related papers (2023-11-21T23:53:27Z)
Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning. They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health. Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z)
Large Language Models for Healthcare Data Augmentation: An Example on Patient-Trial Matching [49.78442796596806]
We propose an innovative privacy-aware data augmentation approach for patient-trial matching (LLM-PTM) Our experiments demonstrate a 7.32% average improvement in performance using the proposed LLM-PTM method, and the generalizability to new data is improved by 12.12%.
arXiv Detail & Related papers (2023-03-24T03:14:00Z)
Development and validation of a natural language processing algorithm to pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain. We annotated a corpus of clinical documents according to 12 types of identifying entities. We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv Detail & Related papers (2023-03-23T17:17:46Z)
SPeC: A Soft Prompt-Based Calibration on Performance Variability of Large Language Model in Clinical Notes Summarization [50.01382938451978]
We introduce a model-agnostic pipeline that employs soft prompts to diminish variance while preserving the advantages of prompt-based summarization. Experimental findings indicate that our method not only bolsters performance but also effectively curbs variance for various language models.
arXiv Detail & Related papers (2023-03-23T04:47:46Z)
Retrieval-Augmented and Knowledge-Grounded Language Models for Faithful Clinical Medicine [68.7814360102644]
We propose the Re$3$Writer method with retrieval-augmented generation and knowledge-grounded reasoning. We demonstrate the effectiveness of our method in generating patient discharge instructions.
arXiv Detail & Related papers (2022-10-23T16:34:39Z)
Clinical Outcome Prediction from Admission Notes using Self-Supervised Knowledge Integration [55.88616573143478]
Outcome prediction from clinical text can prevent doctors from overlooking possible risks. Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-stay prediction are four common outcome prediction targets. We propose clinical outcome pre-training to integrate knowledge about patient outcomes from multiple public sources.
arXiv Detail & Related papers (2021-02-08T10:26:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.