Related papers: ALPHA: AnomaLous Physiological Health Assessment Using Large Language Models

ALPHA: AnomaLous Physiological Health Assessment Using Large Language Models

URL: http://arxiv.org/abs/2311.12524v1
Date: Tue, 21 Nov 2023 11:09:57 GMT
Title: ALPHA: AnomaLous Physiological Health Assessment Using Large Language Models
Authors: Jiankai Tang, Kegang Wang, Hongming Hu, Xiyuxing Zhang, Peiyu Wang, Xin Liu, Yuntao Wang
Abstract summary: Large Language Models (LLMs) exhibit exceptional performance in determining medical indicators. Our specially adapted GPT models demonstrated remarkable proficiency, achieving less than 1 bpm error in cycle count. This study highlights LLMs' dual role as health data analysis tools and pivotal elements in advanced AI health assistants.
Score: 4.247764575421617
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This study concentrates on evaluating the efficacy of Large Language Models (LLMs) in healthcare, with a specific focus on their application in personal anomalous health monitoring. Our research primarily investigates the capabilities of LLMs in interpreting and analyzing physiological data obtained from FDA-approved devices. We conducted an extensive analysis using anomalous physiological data gathered in a simulated low-air-pressure plateau environment. This allowed us to assess the precision and reliability of LLMs in understanding and evaluating users' health status with notable specificity. Our findings reveal that LLMs exhibit exceptional performance in determining medical indicators, including a Mean Absolute Error (MAE) of less than 1 beat per minute for heart rate and less than 1% for oxygen saturation (SpO2). Furthermore, the Mean Absolute Percentage Error (MAPE) for these evaluations remained below 1%, with the overall accuracy of health assessments surpassing 85%. In image analysis tasks, such as interpreting photoplethysmography (PPG) data, our specially adapted GPT models demonstrated remarkable proficiency, achieving less than 1 bpm error in cycle count and 7.28 MAE for heart rate estimation. This study highlights LLMs' dual role as health data analysis tools and pivotal elements in advanced AI health assistants, offering personalized health insights and recommendations within the future health assistant framework.

Related papers

Med-CoDE: Medical Critique based Disagreement Evaluation Framework [72.42301910238861]
The reliability and accuracy of large language models (LLMs) in medical contexts remain critical concerns. Current evaluation methods often lack robustness and fail to provide a comprehensive assessment of LLM performance. We propose Med-CoDE, a specifically designed evaluation framework for medical LLMs to address these challenges.
arXiv Detail & Related papers (2025-04-21T16:51:11Z)
Quantifying the Reasoning Abilities of LLMs on Real-world Clinical Cases [48.87360916431396]
We introduce MedR-Bench, a benchmarking dataset of 1,453 structured patient cases, annotated with reasoning references. We propose a framework encompassing three critical examination recommendation, diagnostic decision-making, and treatment planning, simulating the entire patient care journey. Using this benchmark, we evaluate five state-of-the-art reasoning LLMs, including DeepSeek-R1, OpenAI-o3-mini, and Gemini-2.0-Flash Thinking, etc.
arXiv Detail & Related papers (2025-03-06T18:35:39Z)
EchoQA: A Large Collection of Instruction Tuning Data for Echocardiogram Reports [0.0]
We introduce a novel question-answering (QA) dataset using echocardiogram reports sourced from the Medical Information Mart for Intensive Care database. This dataset is specifically designed to enhance QA systems in cardiology, consisting of 771,244 QA pairs addressing a wide array of cardiac abnormalities and their severity. We compare large language models (LLMs), including open-source and biomedical-specific models for zero-shot evaluation, and closed-source models for zero-shot and three-shot evaluation.
arXiv Detail & Related papers (2025-03-04T07:45:45Z)
An LLM-Powered Agent for Physiological Data Analysis: A Case Study on PPG-based Heart Rate Estimation [2.0195680688695594]
Large language models (LLMs) are revolutionizing healthcare by improving diagnosis, patient care, and decision support through interactive communication. We develop an LLM-powered agent for physiological time-series analysis aimed to bridge the gap in integrating LLMs with well-established analytical tools. Built on the OpenCHA, our agent powered by OpenAI's GPT-3.5-turbo model features an orchestrator that embeds user interaction, data sources, and analytical tools to generate accurate health insights.
arXiv Detail & Related papers (2025-02-18T13:09:59Z)
Finetuning and Quantization of EEG-Based Foundational BioSignal Models on ECG and PPG Data for Blood Pressure Estimation [53.2981100111204]
Photoplethysmography and electrocardiography can potentially enable continuous blood pressure (BP) monitoring. Yet accurate and robust machine learning (ML) models remains challenging due to variability in data quality and patient-specific factors. In this work, we investigate whether a model pre-trained on one modality can effectively be exploited to improve the accuracy of a different signal type. Our approach achieves near state-of-the-art accuracy for diastolic BP and surpasses by 1.5x the accuracy of prior works for systolic BP.
arXiv Detail & Related papers (2025-02-10T13:33:12Z)
Leveraging Large Language Models to Enhance Machine Learning Interpretability and Predictive Performance: A Case Study on Emergency Department Returns for Mental Health Patients [2.3769374446083735]
Emergency department (ED) returns for mental health conditions pose a major healthcare burden, with 24-27% of patients returning within 30 days. To assess whether integrating large language models (LLMs) with machine learning improves predictive accuracy and clinical interpretability of ED mental health return risk models.
arXiv Detail & Related papers (2025-01-21T15:41:20Z)
Reasoning-Enhanced Healthcare Predictions with Knowledge Graph Community Retrieval [61.70489848327436]
KARE is a novel framework that integrates knowledge graph (KG) community-level retrieval with large language models (LLMs) reasoning. Extensive experiments demonstrate that KARE outperforms leading models by up to 10.8-15.0% on MIMIC-III and 12.6-12.7% on MIMIC-IV for mortality and readmission predictions.
arXiv Detail & Related papers (2024-10-06T18:46:28Z)
SemioLLM: Assessing Large Language Models for Semiological Analysis in Epilepsy Research [45.2233252981348]
Large Language Models have shown promising results in their ability to encode general medical knowledge. We test the ability of state-of-the-art LLMs to leverage their internal knowledge and reasoning for epilepsy diagnosis.
arXiv Detail & Related papers (2024-07-03T11:02:12Z)
Large Language Models for Cuffless Blood Pressure Measurement From Wearable Biosignals [14.216163316714285]
There is a notable gap in the utilization of large language models (LLMs) for analyzing wearable biosignals to achieve cuffless blood pressure (BP) measurement. This paper presents the first work to explore the capacity of LLMs to perform cuffless BP estimation based on wearable biosignals. To evaluate the proposed approach, we conducted assessments of ten advanced LLMs using a comprehensive public dataset of wearable biosignals from 1,272 participants.
arXiv Detail & Related papers (2024-06-26T04:54:45Z)
Evaluating the Fairness of the MIMIC-IV Dataset and a Baseline Algorithm: Application to the ICU Length of Stay Prediction [65.268245109828]
This paper uses the MIMIC-IV dataset to examine the fairness and bias in an XGBoost binary classification model predicting the ICU length of stay. The research reveals class imbalances in the dataset across demographic attributes and employs data preprocessing and feature extraction. The paper concludes with recommendations for fairness-aware machine learning techniques for mitigating biases and the need for collaborative efforts among healthcare professionals and data scientists.
arXiv Detail & Related papers (2023-12-31T16:01:48Z)
Mixed-Integer Projections for Automated Data Correction of EMRs Improve Predictions of Sepsis among Hospitalized Patients [7.639610349097473]
We introduce an innovative projections-based method that seamlessly integrates clinical expertise as domain constraints. We measure the distance of corrected data from the constraints defining a healthy range of patient data, resulting in a unique predictive metric we term as "trust-scores" We show an AUROC of 0.865 and a precision of 0.922, that surpasses conventional ML models without such projections.
arXiv Detail & Related papers (2023-08-21T15:14:49Z)
Self-Verification Improves Few-Shot Clinical Information Extraction [73.6905567014859]
Large language models (LLMs) have shown the potential to accelerate clinical curation via few-shot in-context learning. They still struggle with issues regarding accuracy and interpretability, especially in mission-critical domains such as health. Here, we explore a general mitigation framework using self-verification, which leverages the LLM to provide provenance for its own extraction and check its own outputs.
arXiv Detail & Related papers (2023-05-30T22:05:11Z)
Evaluating the Performance of Large Language Models on GAOKAO Benchmark [53.663757126289795]
This paper introduces GAOKAO-Bench, an intuitive benchmark that employs questions from the Chinese GAOKAO examination as test samples. With human evaluation, we obtain the converted total score of LLMs, including GPT-4, ChatGPT and ERNIE-Bot. We also use LLMs to grade the subjective questions, and find that model scores achieve a moderate level of consistency with human scores.
arXiv Detail & Related papers (2023-05-21T14:39:28Z)
FineEHR: Refine Clinical Note Representations to Improve Mortality Prediction [3.9026461169566673]
Large-scale electronic health records provide machine learning models with an abundance of clinical text and vital sign data. Despite the emergence of advanced Natural Language Processing (NLP) algorithms for clinical note analysis, the complex textual structure and noise present in raw clinical data have posed significant challenges. We propose FINEEHR, a system that utilizes two representation learning techniques, namely metric learning and fine-tuning, to refine clinical note embeddings.
arXiv Detail & Related papers (2023-04-24T02:42:52Z)
Large Language Models for Healthcare Data Augmentation: An Example on Patient-Trial Matching [49.78442796596806]
We propose an innovative privacy-aware data augmentation approach for patient-trial matching (LLM-PTM) Our experiments demonstrate a 7.32% average improvement in performance using the proposed LLM-PTM method, and the generalizability to new data is improved by 12.12%.
arXiv Detail & Related papers (2023-03-24T03:14:00Z)
Auditing Algorithmic Fairness in Machine Learning for Health with Severity-Based LOGAN [70.76142503046782]
We propose supplementing machine learning-based (ML) healthcare tools for bias with SLOGAN, an automatic tool for capturing local biases in a clinical prediction task. LOGAN adapts an existing tool, LOcal Group biAs detectioN, by contextualizing group bias detection in patient illness severity and past medical history. On average, SLOGAN identifies larger fairness disparities in over 75% of patient groups than LOGAN while maintaining clustering quality.
arXiv Detail & Related papers (2022-11-16T08:04:12Z)
Machine Learning and Glioblastoma: Treatment Response Monitoring Biomarkers in 2021 [0.3266995794795542]
The aim of the systematic review was to assess recently published studies on diagnostic test accuracy of glioblastoma treatment response monitoring biomarkers in adults. There is likely good diagnostic performance of machine learning models that use MRI features to distinguish between progression and mimics. The diagnostic performance of ML using implicit features did not appear to be superior to ML using explicit features.
arXiv Detail & Related papers (2021-04-15T10:49:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.