Related papers: Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis

Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis

URL: http://arxiv.org/abs/2602.15909v2
Date: Thu, 19 Feb 2026 13:22:10 GMT
Title: Resp-Agent: An Agent-Based System for Multimodal Respiratory Sound Generation and Disease Diagnosis
Authors: Pengfei Zhang, Tianxin Xie, Minghao Yang, Li Liu,
Abstract summary: Resp-Agent is an autonomous multimodal system orchestrated by a novel Active Adrial Curriculum Agent (Thinker-A$2$CA)<n>To address the representation gap, we introduce a Modality-Weaving Diagnoser that weaves EHR data with audio tokens via Strategic Global Attention.<n>To address the data gap, we design a Flow Matching Generator that adapts a text-only Large Language Model (LLM) via modality injection.
Score: 14.922065513695294
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep learning-based respiratory auscultation is currently hindered by two fundamental challenges: (i) inherent information loss, as converting signals into spectrograms discards transient acoustic events and clinical context; (ii) limited data availability, exacerbated by severe class imbalance. To bridge these gaps, we present Resp-Agent, an autonomous multimodal system orchestrated by a novel Active Adversarial Curriculum Agent (Thinker-A$^2$CA). Unlike static pipelines, Thinker-A$^2$CA serves as a central controller that actively identifies diagnostic weaknesses and schedules targeted synthesis in a closed loop. To address the representation gap, we introduce a Modality-Weaving Diagnoser that weaves EHR data with audio tokens via Strategic Global Attention and sparse audio anchors, capturing both long-range clinical context and millisecond-level transients. To address the data gap, we design a Flow Matching Generator that adapts a text-only Large Language Model (LLM) via modality injection, decoupling pathological content from acoustic style to synthesize hard-to-diagnose samples. As a foundation for these efforts, we introduce Resp-229k, a benchmark corpus of 229k recordings paired with LLM-distilled clinical narratives. Extensive experiments demonstrate that Resp-Agent consistently outperforms prior approaches across diverse evaluation settings, improving diagnostic robustness under data scarcity and long-tailed class imbalance. Our code and data are available at https://github.com/zpforlove/Resp-Agent.

Related papers

A Diffusion-Driven Fine-Grained Nodule Synthesis Framework for Enhanced Lung Nodule Detection from Chest Radiographs [2.45811518457038]
Early detection of lung cancer in chest radiographs (CXRs) is crucial for improving patient outcomes.<n>No nodule detection remains challenging due to their subtle appearance and variability in radiological characteristics.<n>This paper proposes a novel diffusion-based framework with low-rank adaptation (LoRA) adapters for characteristic controlled nodule synthesis on CXRs.
arXiv Detail & Related papers (2026-03-02T09:43:58Z)
Digital FAST: An AI-Driven Multimodal Framework for Rapid and Early Stroke Screening [0.7136933021609076]
This study presents a fast, non-invasive multimodal deep learning framework for automatic binary stroke screening based on data collected during the F.A.S.T. assessment.<n>The proposed approach integrates complementary information from facial expressions, speech signals, and upper-body movements to enhance diagnostic robustness.
arXiv Detail & Related papers (2026-01-17T03:35:39Z)
Semi-Supervised Diseased Detection from Speech Dialogues with Multi-Level Data Modeling [27.224093715611534]
We propose a novel framework for learning to detect medical conditions from speech acoustics.<n>Our end-to-end approach dynamically aggregates multi-granularity features and generates high-quality pseudo-labels.<n>This work provides a principled approach to learning from weak, far-end supervision in medical speech analysis.
arXiv Detail & Related papers (2026-01-08T09:10:16Z)
Explainable Multi-Modal Deep Learning for Automatic Detection of Lung Diseases from Respiratory Audio Signals [0.49581497240446293]
This study presents an explainable multimodal deep learning framework for automatic lung-disease detection using respiratory audio signals.<n>The framework incorporates Grad-CAM, Integrated Gradients, and SHAP, generating interpretable spectral, temporal, and feature-level explanations.<n>The findings demonstrate the framework's potential for telemedicine, point-of-care diagnostics, and real-world respiratory screening.
arXiv Detail & Related papers (2025-11-29T17:15:58Z)
Zero-Training Task-Specific Model Synthesis for Few-Shot Medical Image Classification [5.59515535487396]
Deep learning models have achieved remarkable success in medical image analysis but are constrained by the requirement for large-scale, meticulously annotated datasets.<n>We propose a novel paradigm: Zero-Training Task-Specific Model Synthesis (ZS-TMS)<n>Instead of adapting a pre-existing model or training a new one, our approach leverages a large-scale, pre-trained generative engine to directly synthesize the entire set of parameters for a task-specific classifier.
arXiv Detail & Related papers (2025-11-18T03:12:01Z)
Hierarchical Self-Supervised Representation Learning for Depression Detection from Speech [51.14752758616364]
Speech-based depression detection (SDD) is a promising, non-invasive alternative to traditional clinical assessments.<n>We propose HAREN-CTC, a novel architecture that integrates multi-layer SSL features using cross-attention within a multitask learning framework.<n>The model achieves state-of-the-art macro F1-scores of 0.81 on DAIC-WOZ and 0.82 on MODMA, outperforming prior methods across both evaluation scenarios.
arXiv Detail & Related papers (2025-10-05T09:32:12Z)
Informed Deep Abstaining Classifier: Investigating noise-robust training for diagnostic decision support systems [0.7497462432118391]
Deep learning can be used to optimize image-based diagnostic decision support systems. The Informed Deep Abstaining (IDAC) system enhances the noise-robust Deep Abstaining (DAC) loss by incorporating noise level estimations during training. These findings are reproduced on an in-house noisy data set, where labels were extracted from the clinical systems of the University Hospital Bonn by a text-based transformer.
arXiv Detail & Related papers (2024-10-28T13:36:57Z)
Domain Adaptive Synapse Detection with Weak Point Annotations [63.97144211520869]
We present AdaSyn, a framework for domain adaptive synapse detection with weak point annotations. In the WASPSYN challenge at I SBI 2023, our method ranks the 1st place.
arXiv Detail & Related papers (2023-08-31T05:05:53Z)
Improving Multiple Sclerosis Lesion Segmentation Across Clinical Sites: A Federated Learning Approach with Noise-Resilient Training [75.40980802817349]
Deep learning models have shown promise for automatically segmenting MS lesions, but the scarcity of accurately annotated data hinders progress in this area. We introduce a Decoupled Hard Label Correction (DHLC) strategy that considers the imbalanced distribution and fuzzy boundaries of MS lesions. We also introduce a Centrally Enhanced Label Correction (CELC) strategy, which leverages the aggregated central model as a correction teacher for all sites.
arXiv Detail & Related papers (2023-08-31T00:36:10Z)
Leveraging Pretrained Representations with Task-related Keywords for Alzheimer's Disease Detection [69.53626024091076]
Alzheimer's disease (AD) is particularly prominent in older adults. Recent advances in pre-trained models motivate AD detection modeling to shift from low-level features to high-level representations. This paper presents several efficient methods to extract better AD-related cues from high-level acoustic and linguistic features.
arXiv Detail & Related papers (2023-03-14T16:03:28Z)
Spotting adversarial samples for speaker verification by neural vocoders [102.1486475058963]
We adopt neural vocoders to spot adversarial samples for automatic speaker verification (ASV) We find that the difference between the ASV scores for the original and re-synthesize audio is a good indicator for discrimination between genuine and adversarial samples. Our codes will be made open-source for future works to do comparison.
arXiv Detail & Related papers (2021-07-01T08:58:16Z)
SUOD: Accelerating Large-Scale Unsupervised Heterogeneous Outlier Detection [63.253850875265115]
Outlier detection (OD) is a key machine learning (ML) task for identifying abnormal objects from general samples. We propose a modular acceleration system, called SUOD, to address it.
arXiv Detail & Related papers (2020-03-11T00:22:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.