Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis
- URL: http://arxiv.org/abs/2602.15909v2
- Date: Thu, 19 Feb 2026 13:22:10 GMT
- Title: Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis
- Authors: Pengfei Zhang, Tianxin Xie, Minghao Yang, Li Liu,
- Abstract summary: Resp-Agent is an autonomous multimodal system orchestrated by a novel Active Adrial Curriculum Agent (Thinker-A$2$CA)<n>To address the representation gap, we introduce a Modality-Weaving Diagnoser that weaves EHR data with audio tokens via Strategic Global Attention.<n>To address the data gap, we design a Flow Matching Generator that adapts a text-only Large Language Model (LLM) via modality injection.
- Score: 14.922065513695294
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning-based respiratory auscultation is currently hindered by two fundamental challenges: (i) inherent information loss, as converting signals into spectrograms discards transient acoustic events and clinical context; (ii) limited data availability, exacerbated by severe class imbalance. To bridge these gaps, we present Resp-Agent, an autonomous multimodal system orchestrated by a novel Active Adversarial Curriculum Agent (Thinker-A$^2$CA). Unlike static pipelines, Thinker-A$^2$CA serves as a central controller that actively identifies diagnostic weaknesses and schedules targeted synthesis in a closed loop. To address the representation gap, we introduce a Modality-Weaving Diagnoser that weaves EHR data with audio tokens via Strategic Global Attention and sparse audio anchors, capturing both long-range clinical context and millisecond-level transients. To address the data gap, we design a Flow Matching Generator that adapts a text-only Large Language Model (LLM) via modality injection, decoupling pathological content from acoustic style to synthesize hard-to-diagnose samples. As a foundation for these efforts, we introduce Resp-229k, a benchmark corpus of 229k recordings paired with LLM-distilled clinical narratives. Extensive experiments demonstrate that Resp-Agent consistently outperforms prior approaches across diverse evaluation settings, improving diagnostic robustness under data scarcity and long-tailed class imbalance. Our code and data are available at https://github.com/zpforlove/Resp-Agent.
Related papers
- A Diffusion-Driven Fine-Grained Nodule Synthesis Framework for Enhanced Lung Nodule Detection from Chest Radiographs [2.45811518457038]
Early detection of lung cancer in chest radiographs (CXRs) is crucial for improving patient outcomes.<n>No nodule detection remains challenging due to their subtle appearance and variability in radiological characteristics.<n>This paper proposes a novel diffusion-based framework with low-rank adaptation (LoRA) adapters for characteristic controlled nodule synthesis on CXRs.
arXiv Detail & Related papers (2026-03-02T09:43:58Z) - Digital FAST: An AI-Driven Multimodal Framework for Rapid and Early Stroke Screening [0.7136933021609076]
This study presents a fast, non-invasive multimodal deep learning framework for automatic binary stroke screening based on data collected during the F.A.S.T. assessment.<n>The proposed approach integrates complementary information from facial expressions, speech signals, and upper-body movements to enhance diagnostic robustness.
arXiv Detail & Related papers (2026-01-17T03:35:39Z) - Semi-Supervised Diseased Detection from Speech Dialogues with Multi-Level Data Modeling [27.224093715611534]
We propose a novel framework for learning to detect medical conditions from speech acoustics.<n>Our end-to-end approach dynamically aggregates multi-granularity features and generates high-quality pseudo-labels.<n>This work provides a principled approach to learning from weak, far-end supervision in medical speech analysis.
arXiv Detail & Related papers (2026-01-08T09:10:16Z) - Explainable Multi-Modal Deep Learning for Automatic Detection of Lung Diseases from Respiratory Audio Signals [0.49581497240446293]
This study presents an explainable multimodal deep learning framework for automatic lung-disease detection using respiratory audio signals.<n>The framework incorporates Grad-CAM, Integrated Gradients, and SHAP, generating interpretable spectral, temporal, and feature-level explanations.<n>The findings demonstrate the framework's potential for telemedicine, point-of-care diagnostics, and real-world respiratory screening.
arXiv Detail & Related papers (2025-11-29T17:15:58Z) - Zero-Training Task-Specific Model Synthesis for Few-Shot Medical Image Classification [5.59515535487396]
Deep learning models have achieved remarkable success in medical image analysis but are constrained by the requirement for large-scale, meticulously annotated datasets.<n>We propose a novel paradigm: Zero-Training Task-Specific Model Synthesis (ZS-TMS)<n>Instead of adapting a pre-existing model or training a new one, our approach leverages a large-scale, pre-trained generative engine to directly synthesize the entire set of parameters for a task-specific classifier.
arXiv Detail & Related papers (2025-11-18T03:12:01Z) - Hierarchical Self-Supervised Representation Learning for Depression Detection from Speech [51.14752758616364]
Speech-based depression detection (SDD) is a promising, non-invasive alternative to traditional clinical assessments.<n>We propose HAREN-CTC, a novel architecture that integrates multi-layer SSL features using cross-attention within a multitask learning framework.<n>The model achieves state-of-the-art macro F1-scores of 0.81 on DAIC-WOZ and 0.82 on MODMA, outperforming prior methods across both evaluation scenarios.
arXiv Detail & Related papers (2025-10-05T09:32:12Z) - Informed Deep Abstaining Classifier: Investigating noise-robust training for diagnostic decision support systems [0.7497462432118391]
Deep learning can be used to optimize image-based diagnostic decision support systems.
The Informed Deep Abstaining (IDAC) system enhances the noise-robust Deep Abstaining (DAC) loss by incorporating noise level estimations during training.
These findings are reproduced on an in-house noisy data set, where labels were extracted from the clinical systems of the University Hospital Bonn by a text-based transformer.
arXiv Detail & Related papers (2024-10-28T13:36:57Z) - Domain Adaptive Synapse Detection with Weak Point Annotations [63.97144211520869]
We present AdaSyn, a framework for domain adaptive synapse detection with weak point annotations.
In the WASPSYN challenge at I SBI 2023, our method ranks the 1st place.
arXiv Detail & Related papers (2023-08-31T05:05:53Z) - Improving Multiple Sclerosis Lesion Segmentation Across Clinical Sites:
A Federated Learning Approach with Noise-Resilient Training [75.40980802817349]
Deep learning models have shown promise for automatically segmenting MS lesions, but the scarcity of accurately annotated data hinders progress in this area.
We introduce a Decoupled Hard Label Correction (DHLC) strategy that considers the imbalanced distribution and fuzzy boundaries of MS lesions.
We also introduce a Centrally Enhanced Label Correction (CELC) strategy, which leverages the aggregated central model as a correction teacher for all sites.
arXiv Detail & Related papers (2023-08-31T00:36:10Z) - Leveraging Pretrained Representations with Task-related Keywords for
Alzheimer's Disease Detection [69.53626024091076]
Alzheimer's disease (AD) is particularly prominent in older adults.
Recent advances in pre-trained models motivate AD detection modeling to shift from low-level features to high-level representations.
This paper presents several efficient methods to extract better AD-related cues from high-level acoustic and linguistic features.
arXiv Detail & Related papers (2023-03-14T16:03:28Z) - Spotting adversarial samples for speaker verification by neural vocoders [102.1486475058963]
We adopt neural vocoders to spot adversarial samples for automatic speaker verification (ASV)
We find that the difference between the ASV scores for the original and re-synthesize audio is a good indicator for discrimination between genuine and adversarial samples.
Our codes will be made open-source for future works to do comparison.
arXiv Detail & Related papers (2021-07-01T08:58:16Z) - SUOD: Accelerating Large-Scale Unsupervised Heterogeneous Outlier
Detection [63.253850875265115]
Outlier detection (OD) is a key machine learning (ML) task for identifying abnormal objects from general samples.
We propose a modular acceleration system, called SUOD, to address it.
arXiv Detail & Related papers (2020-03-11T00:22:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.