CaReAQA: A Cardiac and Respiratory Audio Question Answering Model for Open-Ended Diagnostic Reasoning
- URL: http://arxiv.org/abs/2505.01199v2
- Date: Sun, 01 Jun 2025 09:47:18 GMT
- Title: CaReAQA: A Cardiac and Respiratory Audio Question Answering Model for Open-Ended Diagnostic Reasoning
- Authors: Tsai-Ning Wang, Lin-Lin Chen, Neil Zeghidour, Aaqib Saeed,
- Abstract summary: CaReAQA is an audio-language model that integrates a foundation audio model with the reasoning capabilities of large language models.<n>We introduce CaReSound, a benchmark dataset of annotated medical audio recordings enriched with metadata.<n> evaluation results show that CaReAQA achieves 86.2% accuracy on open-ended diagnostic reasoning tasks.
- Score: 17.462121203082006
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Medical audio signals, such as heart and lung sounds, play a crucial role in clinical diagnosis. However, analyzing these signals remains challenging: traditional methods rely on handcrafted features or supervised deep learning models that demand extensive labeled datasets, limiting their scalability and applicability. To address these issues, we propose CaReAQA, an audio-language model that integrates a foundation audio model with the reasoning capabilities of large language models, enabling clinically relevant, open-ended diagnostic responses. Alongside CaReAQA, we introduce CaReSound, a benchmark dataset of annotated medical audio recordings enriched with metadata and paired question-answer examples, intended to drive progress in diagnostic reasoning research. Evaluation results show that CaReAQA achieves 86.2% accuracy on open-ended diagnostic reasoning tasks, outperforming baseline models. It also generalizes well to closed-ended classification tasks, achieving an average accuracy of 56.9% on unseen datasets. Our findings show how audio-language integration and reasoning advances medical diagnostics, enabling efficient AI systems for clinical decision support.
Related papers
- DiagnosisArena: Benchmarking Diagnostic Reasoning for Large Language Models [25.13622249539088]
DiagnosisArena is a benchmark designed to rigorously assess professional-level diagnostic competence.<n> DiagnosisArena consists of 1,113 pairs of segmented patient cases and corresponding diagnoses, spanning 28 medical specialties.<n>Our study reveals that even the most advanced reasoning models, o3, o1, and DeepSeek-R1, achieve only 51.12%, 31.09%, and 17.79% accuracy, respectively.
arXiv Detail & Related papers (2025-05-20T09:14:53Z) - MedCaseReasoning: Evaluating and learning diagnostic reasoning from clinical case reports [49.00805568780791]
We introduce MedCaseReasoning, the first open-access dataset for evaluating Large Language Models (LLMs) on their ability to align with clinician-authored diagnostic reasoning.<n>The dataset includes 14,489 diagnostic question-and-answer cases, each paired with detailed reasoning statements.<n>We evaluate state-of-the-art reasoning LLMs on MedCaseReasoning and find significant shortcomings in their diagnoses and reasoning.
arXiv Detail & Related papers (2025-05-16T22:34:36Z) - IP-CRR: Information Pursuit for Interpretable Classification of Chest Radiology Reports [31.359504909372884]
We propose an interpretable-by-design framework for classifying radiology reports.<n>The key idea is to extract a set of most informative queries from a large set of reports and use these queries and their corresponding answers to predict a diagnosis.<n>Experiments on the MIMIC-CXR dataset demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2025-04-30T21:20:05Z) - Structured Outputs Enable General-Purpose LLMs to be Medical Experts [50.02627258858336]
Large language models (LLMs) often struggle with open-ended medical questions.<n>We propose a novel approach utilizing structured medical reasoning.<n>Our approach achieves the highest Factuality Score of 85.8, surpassing fine-tuned models.
arXiv Detail & Related papers (2025-03-05T05:24:55Z) - ICA-RAG: Information Completeness Guided Adaptive Retrieval-Augmented Generation for Disease Diagnosis [16.186500907377965]
ICA-RAG is a novel framework for enhancing RAG reliability in disease diagnosis.<n>It uses an adaptive control module to assess the necessity of retrieval based on the input's information completeness.<n> Experiments on three Chinese electronic medical record datasets demonstrate that ICA-RAG significantly outperforms baseline methods.
arXiv Detail & Related papers (2025-02-20T14:52:36Z) - NeuroXVocal: Detection and Explanation of Alzheimer's Disease through Non-invasive Analysis of Picture-prompted Speech [4.815952991777717]
NeuroXVocal is a novel dual-component system that classifies and explains potential Alzheimer's Disease (AD) cases through speech analysis.<n>The classification component (Neuro) processes three distinct data streams: acoustic features capturing speech patterns and voice characteristics, textual features extracted from speech transcriptions, and precomputed embeddings representing linguistic patterns.<n>The explainability component (XVocal) implements a Retrieval-Augmented Generation (RAG) approach, leveraging Large Language Models combined with a domain-specific knowledge base of AD research literature.
arXiv Detail & Related papers (2025-02-14T12:09:49Z) - Uncertainty-aware Medical Diagnostic Phrase Identification and Grounding [72.18719355481052]
We introduce a novel task called Medical Report Grounding (MRG)<n>MRG aims to directly identify diagnostic phrases and their corresponding grounding boxes from medical reports in an end-to-end manner.<n>We propose uMedGround, a robust and reliable framework that leverages a multimodal large language model to predict diagnostic phrases.
arXiv Detail & Related papers (2024-04-10T07:41:35Z) - Voice EHR: Introducing Multimodal Audio Data for Health [3.8090294667599927]
Existing technologies depend on limited datasets collected with expensive recording equipment in high-income countries.
This report introduces a novel data type and a corresponding collection system that captures health data through guided questions using only a mobile/web application.
arXiv Detail & Related papers (2024-04-02T04:07:22Z) - ChatRadio-Valuer: A Chat Large Language Model for Generalizable
Radiology Report Generation Based on Multi-institution and Multi-system Data [115.0747462486285]
ChatRadio-Valuer is a tailored model for automatic radiology report generation that learns generalizable representations.
The clinical dataset utilized in this study encompasses a remarkable total of textbf332,673 observations.
ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al.
arXiv Detail & Related papers (2023-10-08T17:23:17Z) - Towards the Identifiability and Explainability for Personalized Learner
Modeling: An Inductive Paradigm [36.60917255464867]
We propose an identifiable cognitive diagnosis framework (ID-CDF) based on a novel response-proficiency-response paradigm inspired by encoder-decoder models.
We show that ID-CDF can effectively address the problems without loss of diagnosis preciseness.
arXiv Detail & Related papers (2023-09-01T07:18:02Z) - Exploring linguistic feature and model combination for speech
recognition based automatic AD detection [61.91708957996086]
Speech based automatic AD screening systems provide a non-intrusive and more scalable alternative to other clinical screening techniques.
Scarcity of specialist data leads to uncertainty in both model selection and feature learning when developing such systems.
This paper investigates the use of feature and model combination approaches to improve the robustness of domain fine-tuning of BERT and Roberta pre-trained text encoders.
arXiv Detail & Related papers (2022-06-28T05:09:01Z) - Hemogram Data as a Tool for Decision-making in COVID-19 Management:
Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure.
This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients.
Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z) - Query-Focused EHR Summarization to Aid Imaging Diagnosis [22.21438906817433]
We propose and evaluate models that extract relevant text snippets from patient records to provide a rough case summary.
We use groups of International Classification of Diseases (ICD) codes observed in 'future' records as noisy proxies for 'downstream' diagnoses.
We train (via distant supervision) and evaluate variants of this model on EHR data from Brigham and Women's Hospital in Boston and MIMIC-III.
arXiv Detail & Related papers (2020-04-09T16:32:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.