Related papers: Pseudo2Real: Task Arithmetic for Pseudo-Label Correction in Automatic Speech Recognition

Pseudo2Real: Task Arithmetic for Pseudo-Label Correction in Automatic Speech Recognition

URL: http://arxiv.org/abs/2510.08047v1
Date: Thu, 09 Oct 2025 10:31:47 GMT
Title: Pseudo2Real: Task Arithmetic for Pseudo-Label Correction in Automatic Speech Recognition
Authors: Yi-Cheng Lin, Yu-Hsuan Li Liang, Hsuan Su, Tzu-Quan Lin, Shang-Tse Chen, Yun-Nung Chen, Hung-yi Lee,
Abstract summary: Real-world systems encounter unseen accents and domains with limited labeled data.<n> pseudo-labeling often introduces systematic, accent-specific errors that filtering fails to fix.<n>We propose a simple parameter-space correction to correct these recurring biases without target ground truth.
Score: 61.712328155788434
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Robust ASR under domain shift is crucial because real-world systems encounter unseen accents and domains with limited labeled data. Although pseudo-labeling offers a practical workaround, it often introduces systematic, accent-specific errors that filtering fails to fix. We ask: How can we correct these recurring biases without target ground truth? We propose a simple parameter-space correction: in a source domain containing both real and pseudo-labeled data, two ASR models are fine-tuned from the same initialization, one on ground-truth labels and the other on pseudo-labels, and their weight difference forms a correction vector that captures pseudo-label biases. When applied to a pseudo-labeled target model, this vector enhances recognition, achieving up to a 35% relative Word Error Rate (WER) reduction on AfriSpeech-200 across ten African accents with the Whisper tiny model.

Related papers

ReHear: Iterative Pseudo-Label Refinement for Semi-Supervised Speech Recognition via Audio Large Language Models [12.527207210862151]
ReHear is a framework for iterative pseudo-label refinement in automatic speech recognition.<n>It integrates an instruction-tuned, audio-aware large language model into the self-training loop.<n>We show that ReHear effectively mitigates error propagation, consistently outperforming both supervised and pseudo-labeling baselines.
arXiv Detail & Related papers (2026-02-21T05:04:22Z)
Retrieval-Augmented Self-Taught Reasoning Model with Adaptive Chain-of-Thought for ASR Named Entity Correction [12.483998165719981]
We propose a retrieval-augmented generation framework for correcting named entity errors in automatic speech recognition (ASR)<n>Our approach consists of two key components: (1) a rephrasing language model (RLM) for named entity recognition, followed by candidate retrieval using a phonetic-level edit distance; and (2) a novel self-taught reasoning model with adaptive chain-of-thought (A-STAR) that dynamically adjusts the depth of its reasoning based on task difficulty.
arXiv Detail & Related papers (2026-01-21T15:05:39Z)
Towards Micro-Action Recognition with Limited Annotations: An Asynchronous Pseudo Labeling and Training Approach [35.32024173141412]
We introduce the setting of Semi-Supervised MAR (SSMAR), where only a part of samples are labeled.<n>Traditional Semi-Supervised Learning (SSL) methods tend to overfit on inaccurate pseudo-labels, leading to error accumulation and degraded performance.<n>We propose Asynchronous Pseudo Labeling and Training (APLT), which explicitly separates the pseudo-labeling process from model training.
arXiv Detail & Related papers (2025-04-10T14:22:15Z)
Error Correction by Paying Attention to Both Acoustic and Confidence References for Automatic Speech Recognition [52.624909026294105]
We propose a non-autoregressive speech error correction method. A Confidence Module measures the uncertainty of each word of the N-best ASR hypotheses. The proposed system reduces the error rate by 21% compared with the ASR model.
arXiv Detail & Related papers (2024-06-29T17:56:28Z)
Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition [49.42732949233184]
When labeled data is insufficient, semi-supervised learning with the pseudo-labeling technique can significantly improve the performance of automatic speech recognition. Taking noisy labels as ground-truth in the loss function results in suboptimal performance. We propose a novel framework named alternative pseudo-labeling to tackle the issue of noisy pseudo-labels.
arXiv Detail & Related papers (2023-08-12T12:13:52Z)
Combating Confirmation Bias: A Unified Pseudo-Labeling Framework for Entity Alignment [30.407534668054286]
We propose a Unified Pseudo-Labeling framework for Entity Alignment (UPL-EA)<n>UPL-EA explicitly eliminates pseudo-labeling errors to boost the accuracy of entity alignment.<n>Our results and in-depth analyses demonstrate the superiority of UPL-EA over 15 competitive baselines.
arXiv Detail & Related papers (2023-07-05T07:32:34Z)
Robust Target Training for Multi-Source Domain Adaptation [110.77704026569499]
We propose a novel Bi-level Optimization based Robust Target Training (BORT$2$) method for MSDA. Our proposed method achieves the state of the art performance on three MSDA benchmarks, including the large-scale DomainNet dataset.
arXiv Detail & Related papers (2022-10-04T15:20:01Z)
FastCorrect: Fast Error Correction with Edit Alignment for Automatic Speech Recognition [90.34177266618143]
We propose FastCorrect, a novel NAR error correction model based on edit alignment. FastCorrect speeds up the inference by 6-9 times and maintains the accuracy (8-14% WER reduction) compared with the autoregressive correction model. It outperforms the accuracy of popular NAR models adopted in neural machine translation by a large margin.
arXiv Detail & Related papers (2021-05-09T05:35:36Z)
Cross-domain Speech Recognition with Unsupervised Character-level Distribution Matching [60.8427677151492]
We propose CMatch, a Character-level distribution matching method to perform fine-grained adaptation between each character in two domains. Experiments on the Libri-Adapt dataset show that our proposed approach achieves 14.39% and 16.50% relative Word Error Rate (WER) reduction on both cross-device and cross-environment ASR.
arXiv Detail & Related papers (2021-04-15T14:36:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.