Related papers: Training-Free Intelligibility-Guided Observation Addition for Noisy ASR

Training-Free Intelligibility-Guided Observation Addition for Noisy ASR

URL: http://arxiv.org/abs/2602.20967v1
Date: Tue, 24 Feb 2026 14:46:54 GMT
Title: Training-Free Intelligibility-Guided Observation Addition for Noisy ASR
Authors: Haoyang Li, Changsong Liu, Wei Rao, Hao Shi, Sakriani Sakti, Eng Siong Chng,
Abstract summary: This paper proposes an intelligibility-guided observation addition (OA) method to improve speech recognition in noisy environments.<n>Experiments across diverse SE-ASR combinations and datasets demonstrate strong robustness and improvements over existing OA baselines.
Score: 57.74127683005929
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Automatic speech recognition (ASR) degrades severely in noisy environments. Although speech enhancement (SE) front-ends effectively suppress background noise, they often introduce artifacts that harm recognition. Observation addition (OA) addressed this issue by fusing noisy and SE enhanced speech, improving recognition without modifying the parameters of the SE or ASR models. This paper proposes an intelligibility-guided OA method, where fusion weights are derived from intelligibility estimates obtained directly from the backend ASR. Unlike prior OA methods based on trained neural predictors, the proposed method is training-free, reducing complexity and enhances generalization. Extensive experiments across diverse SE-ASR combinations and datasets demonstrate strong robustness and improvements over existing OA baselines. Additional analyses of intelligibility-guided switching-based alternatives and frame versus utterance-level OA further validate the proposed design.

Related papers

Boosting ASR Robustness via Test-Time Reinforcement Learning with Audio-Text Semantic Rewards [8.109014000578766]
We present ASR-TRA, a novel Testtime Reinforcement Adaptation framework inspired by causal intervention.<n>Our method achieves higher accuracy while maintaining lower latency than existing TTA baselines.<n>Our approach provides a practical and robust solution for deploying ASR systems in challenging real-world conditions.
arXiv Detail & Related papers (2026-03-05T14:43:15Z)
ASK: Adaptive Self-improving Knowledge Framework for Audio Text Retrieval [19.94287753279928]
The dominant paradigm for Audio-Text Retrieval (ATR) relies on mini-batch-based contrastive learning.<n>The Gradient Locality Bottleneck (GLB) structurally prevents models from leveraging out-of-batch knowledge.<n>The Representation-Drift Mismatch (RDM) is where a static knowledge base becomes progressively misaligned with the evolving model, turning guidance into noise.
arXiv Detail & Related papers (2025-12-11T14:48:30Z)
AURORA: Augmented Understanding via Structured Reasoning and Reinforcement Learning for Reference Audio-Visual Segmentation [113.75682363364004]
AURORA is a framework designed to enhance genuine reasoning and language comprehension in reference audio-visual segmentation.<n>AURORA achieves state-of-the-art performance on Ref-AVS benchmarks and generalizes effectively to unreferenced segmentation.
arXiv Detail & Related papers (2025-08-04T07:47:38Z)
EKPC: Elastic Knowledge Preservation and Compensation for Class-Incremental Learning [53.88000987041739]
Class-Incremental Learning (CIL) aims to enable AI models to continuously learn from sequentially arriving data of different classes over time.<n>We propose the Elastic Knowledge Preservation and Compensation (EKPC) method, integrating Importance-aware importance Regularization (IPR) and Trainable Semantic Drift Compensation (TSDC) for CIL.
arXiv Detail & Related papers (2025-06-14T05:19:58Z)
AS-ASR: A Lightweight Framework for Aphasia-Specific Automatic Speech Recognition [5.916484958997203]
AS-ASR is a lightweight aphasia-specific speech recognition framework based on Whisper-tiny.<n>Our approach systematically combines standard and aphasic speech at varying ratios, enabling robust generalization.
arXiv Detail & Related papers (2025-06-06T22:38:53Z)
Interventional Speech Noise Injection for ASR Generalizable Spoken Language Understanding [26.98755758066905]
We train SLU models to withstand ASR errors by exposing them to noises commonly observed in ASR systems. We propose a novel and less biased augmentation method of introducing the noises that are plausible to any ASR system.
arXiv Detail & Related papers (2024-10-21T03:13:22Z)
DASA: Difficulty-Aware Semantic Augmentation for Speaker Verification [55.306583814017046]
We present a novel difficulty-aware semantic augmentation (DASA) approach for speaker verification. DASA generates diversified training samples in speaker embedding space with negligible extra computing cost. The best result achieves a 14.6% relative reduction in EER metric on CN-Celeb evaluation set.
arXiv Detail & Related papers (2023-10-18T17:07:05Z)
Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction [109.44933866397123]
Noise robustness is essential for deploying automatic speech recognition systems in real-world environments. We employ a noise-robust representation learned by a refined self-supervised framework for noisy speech recognition. We achieve comparable performance to the best supervised approach reported with only 16% of labeled data.
arXiv Detail & Related papers (2021-10-28T20:39:02Z)
An Approach to Improve Robustness of NLP Systems against ASR Errors [39.57253455717825]
Speech-enabled systems typically first convert audio to text through an automatic speech recognition model and then feed the text to downstream natural language processing modules. The errors of the ASR system can seriously downgrade the performance of the NLP modules. Previous work has shown it is effective to employ data augmentation methods to solve this problem by injecting ASR noise during the training process.
arXiv Detail & Related papers (2021-03-25T05:15:43Z)
Improving noise robust automatic speech recognition with single-channel time-domain enhancement network [100.1041336974175]
We show that a single-channel time-domain denoising approach can significantly improve ASR performance. We show that single-channel noise reduction can still improve ASR performance.
arXiv Detail & Related papers (2020-03-09T09:36:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.