RA-QA: Towards Respiratory Audio-based Health Question Answering
- URL: http://arxiv.org/abs/2602.18452v1
- Date: Wed, 04 Feb 2026 13:25:47 GMT
- Title: RA-QA: Towards Respiratory Audio-based Health Question Answering
- Authors: Gaia A. Bertolino, Yuwei Zhang, Tong Xia, Domenico Talia, Cecilia Mascolo,
- Abstract summary: Respiratory diseases are a leading cause of death globally, highlighting the urgent need for early and accessible screening methods.<n>There remains a critical gap: the lack of intelligent systems that can interact in real-time consultations using natural language.<n>We curated and harmonized data from 11 diverse respiratory audio datasets to construct the first Respiratory Audio Question Answering dataset.
- Score: 17.905364553833724
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Respiratory diseases are a leading cause of death globally, highlighting the urgent need for early and accessible screening methods. While some lung auscultation analysis has been automated and machine learning audio based models are able to predict respiratory pathologies, there remains a critical gap: the lack of intelligent systems that can interact in real-time consultations using natural language. Unlike other clinical domains, such as electronic health records, radiological images, and biosignals, where numerous question-answering (QA) datasets and models have been established, audio-based modalities remain notably underdeveloped. We curated and harmonized data from 11 diverse respiratory audio datasets to construct the first Respiratory Audio Question Answering (RA-QA) dataset. As the first multimodal QA resource of its kind focused specifically on respiratory health, RA-QA bridges clinical audio and natural language in a structured, scalable format. This new data resource contains about 7.5 million QA pairs spanning more than 60 attributes and three question types: single verification, multiple choice, and open-ended questions. Building upon this dataset, we introduce a novel benchmark that compares audio-text generation models with traditional audio classifiers to evaluate their respective performance.\\Our experiments reveal interesting performance variations across different attributes and question types, establishing a baseline and paving the way for more advanced architectures that could further improve the performance. By bridging machine learning with real-world clinical dialogue, our work opens the door to the development of more interactive, intelligent, and accessible diagnostic tools in respiratory healthcare.
Related papers
- A Fully Open and Generalizable Foundation Model for Ultrasound Clinical Applications [77.3888788549565]
We present EchoCare, a novel ultrasound foundation model for generalist clinical use.<n>We developed EchoCare via self-supervised learning on our curated, publicly available, large-scale dataset EchoCareData.<n>With minimal training, EchoCare outperforms state-of-the-art comparison models across 10 representative ultrasound benchmarks.
arXiv Detail & Related papers (2025-09-15T10:05:31Z) - MedQARo: A Large-Scale Benchmark for Medical Question Answering in Romanian [50.767415194856135]
We introduce MedQARo, the first large-scale medical QA benchmark in Romanian.<n>We construct a high-quality and large-scale dataset comprising 102,646 QA pairs related to cancer patients.
arXiv Detail & Related papers (2025-08-22T13:48:37Z) - Multi-Domain Audio Question Answering Toward Acoustic Content Reasoning in The DCASE 2025 Challenge [102.84031769492708]
This task defines three QA subsets to test audio-language models on interactive question-answering over diverse acoustic scenes.<n>Preliminary results on the development set are compared, showing strong variation across models and subsets.<n>This challenge aims to advance the audio understanding and reasoning capabilities of audio-language models toward human-level acuity.
arXiv Detail & Related papers (2025-05-12T09:04:16Z) - CaReAQA: A Cardiac and Respiratory Audio Question Answering Model for Open-Ended Diagnostic Reasoning [17.462121203082006]
CaReAQA is an audio-language model that integrates a foundation audio model with the reasoning capabilities of large language models.<n>We introduce CaReSound, a benchmark dataset of annotated medical audio recordings enriched with metadata.<n> evaluation results show that CaReAQA achieves 86.2% accuracy on open-ended diagnostic reasoning tasks.
arXiv Detail & Related papers (2025-05-02T11:42:46Z) - RespLLM: Unifying Audio and Text with Multimodal LLMs for Generalized Respiratory Health Prediction [20.974460332254544]
RespLLM is a novel framework that unifies text and audio representations for respiratory health prediction.
Our work lays the foundation for multimodal models that can perceive, listen, and understand heterogeneous data.
arXiv Detail & Related papers (2024-10-07T17:06:11Z) - Where are we in audio deepfake detection? A systematic analysis over generative and detection models [59.09338266364506]
SONAR is a synthetic AI-Audio Detection Framework and Benchmark.<n>It provides a comprehensive evaluation for distinguishing cutting-edge AI-synthesized auditory content.<n>It is the first framework to uniformly benchmark AI-audio detection across both traditional and foundation model-based detection systems.
arXiv Detail & Related papers (2024-10-06T01:03:42Z) - Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking [27.708473070563013]
Respiratory audio has predictive power for a wide range of healthcare applications, yet is currently under-explored.
We introduce OPERA, an OPEn Respiratory Acoustic foundation model pretraining and benchmarking system.
arXiv Detail & Related papers (2024-06-23T16:04:26Z) - Rene: A Pre-trained Multi-modal Architecture for Auscultation of Respiratory Diseases [5.810320353233697]
We introduce Rene, a pioneering large-scale model tailored for respiratory sound recognition.
Our innovative approach applies a pre-trained speech recognition model to process respiratory sounds.
We have developed a real-time respiratory sound discrimination system utilizing the Rene architecture.
arXiv Detail & Related papers (2024-05-13T03:00:28Z) - Voice EHR: Introducing Multimodal Audio Data for Health [3.8090294667599927]
Existing technologies depend on limited datasets collected with expensive recording equipment in high-income countries.
This report introduces a novel data type and a corresponding collection system that captures health data through guided questions using only a mobile/web application.
arXiv Detail & Related papers (2024-04-02T04:07:22Z) - AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension [95.8442896569132]
We introduce AIR-Bench, the first benchmark to evaluate the ability of Large Audio-Language Models (LALMs) to understand various types of audio signals and interact with humans in the textual format.
Results demonstrate a high level of consistency between GPT-4-based evaluation and human evaluation.
arXiv Detail & Related papers (2024-02-12T15:41:22Z) - ChatRadio-Valuer: A Chat Large Language Model for Generalizable
Radiology Report Generation Based on Multi-institution and Multi-system Data [115.0747462486285]
ChatRadio-Valuer is a tailored model for automatic radiology report generation that learns generalizable representations.
The clinical dataset utilized in this study encompasses a remarkable total of textbf332,673 observations.
ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al.
arXiv Detail & Related papers (2023-10-08T17:23:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.