Listen, Adapt, Better WER: Source-free Single-utterance Test-time
Adaptation for Automatic Speech Recognition
- URL: http://arxiv.org/abs/2203.14222v1
- Date: Sun, 27 Mar 2022 06:38:39 GMT
- Title: Listen, Adapt, Better WER: Source-free Single-utterance Test-time
Adaptation for Automatic Speech Recognition
- Authors: Guan-Ting Lin, Shang-Wen Li, Hung-yi Lee
- Abstract summary: Test-time Adaptation aims to adapt the model trained on source domains to yield better predictions for test samples.
Single-Utterance Test-time Adaptation (SUTA) is the first TTA study in speech area to our best knowledge.
- Score: 65.84978547406753
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Although deep learning-based end-to-end Automatic Speech Recognition (ASR)
has shown remarkable performance in recent years, it suffers severe performance
regression on test samples drawn from different data distributions. Test-time
Adaptation (TTA), previously explored in the computer vision area, aims to
adapt the model trained on source domains to yield better predictions for test
samples, often out-of-domain, without accessing the source data. Here, we
propose the Single-Utterance Test-time Adaptation (SUTA) framework for ASR,
which is the first TTA study in speech area to our best knowledge. The
single-utterance TTA is a more realistic setting that does not assume test data
are sampled from identical distribution and does not delay on-demand inference
due to pre-collection for the batch of adaptation data. SUTA consists of
unsupervised objectives with an efficient adaptation strategy. The empirical
results demonstrate that SUTA effectively improves the performance of the
source ASR model evaluated on multiple out-of-domain target corpora and
in-domain test samples.
Related papers
- BoostAdapter: Improving Vision-Language Test-Time Adaptation via Regional Bootstrapping [64.8477128397529]
We propose a training-required and training-free test-time adaptation framework.
We maintain a light-weight key-value memory for feature retrieval from instance-agnostic historical samples and instance-aware boosting samples.
We theoretically justify the rationality behind our method and empirically verify its effectiveness on both the out-of-distribution and the cross-domain datasets.
arXiv Detail & Related papers (2024-10-20T15:58:43Z) - Distribution Alignment for Fully Test-Time Adaptation with Dynamic Online Data Streams [19.921480334048756]
Test-Time Adaptation (TTA) enables adaptation and inference in test data streams with domain shifts from the source.
We propose a novel Distribution Alignment loss for TTA.
We surpass existing methods in non-i.i.d. scenarios and maintain competitive performance under the ideal i.i.d. assumption.
arXiv Detail & Related papers (2024-07-16T19:33:23Z) - SGEM: Test-Time Adaptation for Automatic Speech Recognition via
Sequential-Level Generalized Entropy Minimization [30.61075178799518]
A test-time adaptation (TTA) method has recently been proposed to adapt the pre-trained ASR model on unlabeled test instances without source data.
We propose a novel TTA framework, dubbed SGEM, for general ASR models.
SGEM achieves state-of-the-art performance for three mainstream ASR models under various domain shifts.
arXiv Detail & Related papers (2023-06-03T02:27:08Z) - Robust Continual Test-time Adaptation: Instance-aware BN and
Prediction-balanced Memory [58.72445309519892]
We present a new test-time adaptation scheme that is robust against non-i.i.d. test data streams.
Our novelty is mainly two-fold: (a) Instance-Aware Batch Normalization (IABN) that corrects normalization for out-of-distribution samples, and (b) Prediction-balanced Reservoir Sampling (PBRS) that simulates i.i.d. data stream from non-i.i.d. stream in a class-balanced manner.
arXiv Detail & Related papers (2022-08-10T03:05:46Z) - CAFA: Class-Aware Feature Alignment for Test-Time Adaptation [50.26963784271912]
Test-time adaptation (TTA) aims to address this challenge by adapting a model to unlabeled data at test time.
We propose a simple yet effective feature alignment loss, termed as Class-Aware Feature Alignment (CAFA), which simultaneously encourages a model to learn target representations in a class-discriminative manner.
arXiv Detail & Related papers (2022-06-01T03:02:07Z) - Efficient Test-Time Model Adaptation without Forgetting [60.36499845014649]
Test-time adaptation seeks to tackle potential distribution shifts between training and testing data.
We propose an active sample selection criterion to identify reliable and non-redundant samples.
We also introduce a Fisher regularizer to constrain important model parameters from drastic changes.
arXiv Detail & Related papers (2022-04-06T06:39:40Z) - Representative Subset Selection for Efficient Fine-Tuning in
Self-Supervised Speech Recognition [6.450618373898492]
We consider the task of identifying an optimal subset of data for efficient fine-tuning in self-supervised speech models for ASR.
We present the COWERAGE algorithm for representative subset selection in self-supervised ASR.
arXiv Detail & Related papers (2022-03-18T10:12:24Z) - Unsupervised Domain Adaptation for Speech Recognition via Uncertainty
Driven Self-Training [55.824641135682725]
Domain adaptation experiments using WSJ as a source domain and TED-LIUM 3 as well as SWITCHBOARD show that up to 80% of the performance of a system trained on ground-truth data can be recovered.
arXiv Detail & Related papers (2020-11-26T18:51:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.