Large Language Models for Patient Comments Multi-Label Classification
        - URL: http://arxiv.org/abs/2410.23528v3
 - Date: Thu, 20 Feb 2025 01:42:01 GMT
 - Title: Large Language Models for Patient Comments Multi-Label Classification
 - Authors: Hajar Sakai, Sarah S. Lam, Mohammadsadegh Mikaeili, Joshua Bosire, Franziska Jovin, 
 - Abstract summary: This research explores leveraging Large Language Models (LLMs) in conducting Multi-label Text Classification (MLTC) of inpatient comments.<n> GPT-4 Turbo was leveraged to conduct the classification.<n>Using the prompt engineering framework, zero-shot learning, in-context learning, and chain-of-thought prompting were experimented with.
 - Score: 3.670008893193884
 - License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
 - Abstract:   Patient experience and care quality are crucial for a hospital's sustainability and reputation. The analysis of patient feedback offers valuable insight into patient satisfaction and outcomes. However, the unstructured nature of these comments poses challenges for traditional machine learning methods following a supervised learning paradigm. This is due to the unavailability of labeled data and the nuances these texts encompass. This research explores leveraging Large Language Models (LLMs) in conducting Multi-label Text Classification (MLTC) of inpatient comments shared after a stay in the hospital. GPT-4 Turbo was leveraged to conduct the classification. However, given the sensitive nature of patients' comments, a security layer is introduced before feeding the data to the LLM through a Protected Health Information (PHI) detection framework, which ensures patients' de-identification. Additionally, using the prompt engineering framework, zero-shot learning, in-context learning, and chain-of-thought prompting were experimented with. Results demonstrate that GPT-4 Turbo, whether following a zero-shot or few-shot setting, outperforms traditional methods and Pre-trained Language Models (PLMs) and achieves the highest overall performance with an F1-score of 76.12% and a weighted F1-score of 73.61% followed closely by the few-shot learning results. Subsequently, the results' association with other patient experience structured variables (e.g., rating) was conducted. The study enhances MLTC through the application of LLMs, offering healthcare practitioners an efficient method to gain deeper insights into patient feedback and deliver prompt, appropriate responses. 
 
       
      
        Related papers
        - ICU-TSB: A Benchmark for Temporal Patient Representation Learning for   Unsupervised Stratification into Patient Cohorts [0.055923945039144905]
We introduce ICU-TSB (Temporal Stratification Benchmark), the first benchmark for evaluating patient stratification based on temporal patient representation learning.<n>A key contribution of our benchmark is a novel hierarchical evaluation framework utilizing disease to measure the alignment of discovered clusters with clinically validated disease groupings.<n>Our results demonstrate that temporal representation learning can rediscover clinically meaningful patient cohorts.
arXiv  Detail & Related papers  (2025-06-06T15:52:50Z) - Optimizing Large Language Models for Detecting Symptoms of Comorbid   Depression or Anxiety in Chronic Diseases: Insights from Patient Messages [4.419296403133379]
Patients with diabetes are at increased risk of comorbid depression or anxiety, complicating their management.
This study evaluated the performance of large language models (LLMs) in detecting these symptoms from secure patient messages.
arXiv  Detail & Related papers  (2025-03-14T13:27:35Z) - Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases [48.87360916431396]
We introduce MedR-Bench, a benchmarking dataset of 1,453 structured patient cases, annotated with reasoning references.
We propose a framework encompassing three critical examination recommendation, diagnostic decision-making, and treatment planning, simulating the entire patient care journey.
Using this benchmark, we evaluate five state-of-the-art reasoning LLMs, including DeepSeek-R1, OpenAI-o3-mini, and Gemini-2.0-Flash Thinking, etc.
arXiv  Detail & Related papers  (2025-03-06T18:35:39Z) - A Comparative Study of Recent Large Language Models on Generating   Hospital Discharge Summaries for Lung Cancer Patients [19.777109737517996]
This research aims to explore how large language models (LLMs) can alleviate the burden of manual summarization.
This study evaluates the performance of multiple LLMs, including GPT-3.5, GPT-4, GPT-4o, and LLaMA 3 8b, in generating discharge summaries.
arXiv  Detail & Related papers  (2024-11-06T10:02:50Z) - Evaluating the Effectiveness of the Foundational Models for Q&A   Classification in Mental Health care [0.18416014644193068]
Pre-trained Language Models (PLMs) have the potential to transform mental health support.
This study evaluates the effectiveness of PLMs for classification of Questions and Answers in the domain of mental health care.
arXiv  Detail & Related papers  (2024-06-23T00:11:07Z) - Evaluation of General Large Language Models in Contextually Assessing
  Semantic Concepts Extracted from Adult Critical Care Electronic Health Record
  Notes [17.648021186810663]
The purpose of this study was to evaluate the performance of Large Language Models (LLMs) in understanding and processing real-world clinical notes.
The GPT family models have demonstrated considerable efficiency, evidenced by their cost-effectiveness and time-saving capabilities.
arXiv  Detail & Related papers  (2024-01-24T16:52:37Z) - Multimodal Pretraining of Medical Time Series and Notes [45.89025874396911]
Deep learning models show promise in extracting meaningful patterns, but they require extensive labeled data.
We propose a novel approach employing self-supervised pretraining, focusing on the alignment of clinical measurements and notes.
In downstream tasks, including in-hospital mortality prediction and phenotyping, our model outperforms baselines in settings where only a fraction of the data is labeled.
arXiv  Detail & Related papers  (2023-12-11T21:53:40Z) - From Classification to Clinical Insights: Towards Analyzing and   Reasoning About Mobile and Behavioral Health Data With Large Language Models [21.427976533706737]
We take a novel approach that leverages large language models to synthesize clinically useful insights from multi-sensor data.
We develop chain of thought prompting methods that use LLMs to generate reasoning about how trends in data relate to conditions like depression and anxiety.
We find models like GPT-4 correctly reference numerical data 75% of the time, and clinician participants express strong interest in using this approach to interpret self-tracking data.
arXiv  Detail & Related papers  (2023-11-21T23:53:27Z) - Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning.
They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health.
Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv  Detail & Related papers  (2023-05-30T22:05:11Z) - Large Language Models for Healthcare Data Augmentation: An Example on
  Patient-Trial Matching [49.78442796596806]
We propose an innovative privacy-aware data augmentation approach for patient-trial matching (LLM-PTM)
Our experiments demonstrate a 7.32% average improvement in performance using the proposed LLM-PTM method, and the generalizability to new data is improved by 12.12%.
arXiv  Detail & Related papers  (2023-03-24T03:14:00Z) - Development and validation of a natural language processing algorithm to
  pseudonymize documents in the context of a clinical data warehouse [53.797797404164946]
The study highlights the difficulties faced in sharing tools and resources in this domain.
We annotated a corpus of clinical documents according to 12 types of identifying entities.
We build a hybrid system, merging the results of a deep learning model as well as manual rules.
arXiv  Detail & Related papers  (2023-03-23T17:17:46Z) - SPeC: A Soft Prompt-Based Calibration on Performance Variability of
  Large Language Model in Clinical Notes Summarization [50.01382938451978]
We introduce a model-agnostic pipeline that employs soft prompts to diminish variance while preserving the advantages of prompt-based summarization.
 Experimental findings indicate that our method not only bolsters performance but also effectively curbs variance for various language models.
arXiv  Detail & Related papers  (2023-03-23T04:47:46Z) - Retrieval-Augmented and Knowledge-Grounded Language Models for Faithful   Clinical Medicine [68.7814360102644]
We propose the Re$3$Writer method with retrieval-augmented generation and knowledge-grounded reasoning.
We demonstrate the effectiveness of our method in generating patient discharge instructions.
arXiv  Detail & Related papers  (2022-10-23T16:34:39Z) - Clinical Outcome Prediction from Admission Notes using Self-Supervised
  Knowledge Integration [55.88616573143478]
Outcome prediction from clinical text can prevent doctors from overlooking possible risks.
Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-stay prediction are four common outcome prediction targets.
We propose clinical outcome pre-training to integrate knowledge about patient outcomes from multiple public sources.
arXiv  Detail & Related papers  (2021-02-08T10:26:44Z) 
        This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.