Decoupled Structure for Improved Adaptability of End-to-End Models
- URL: http://arxiv.org/abs/2308.13345v1
- Date: Fri, 25 Aug 2023 12:31:12 GMT
- Title: Decoupled Structure for Improved Adaptability of End-to-End Models
- Authors: Keqi Deng, Philip C. Woodland
- Abstract summary: This paper proposes decoupled structures for attention-based encoder-decoder (Decoupled-AED) and neural transducer (Decoupled-Transducer) models.
The acoustic and linguistic parts of the E2E model decoder (or prediction network) are decoupled, making the linguistic component replaceable.
Experiments for E2E ASR models trained on the Libri-100h corpus showed that the proposed decoupled structure gave 15.1% and 17.2% relative word error rate reductions.
- Score: 16.195423291103975
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although end-to-end (E2E) trainable automatic speech recognition (ASR) has
shown great success by jointly learning acoustic and linguistic information, it
still suffers from the effect of domain shifts, thus limiting potential
applications. The E2E ASR model implicitly learns an internal language model
(LM) which characterises the training distribution of the source domain, and
the E2E trainable nature makes the internal LM difficult to adapt to the target
domain with text-only data To solve this problem, this paper proposes decoupled
structures for attention-based encoder-decoder (Decoupled-AED) and neural
transducer (Decoupled-Transducer) models, which can achieve flexible domain
adaptation in both offline and online scenarios while maintaining robust
intra-domain performance. To this end, the acoustic and linguistic parts of the
E2E model decoder (or prediction network) are decoupled, making the linguistic
component (i.e. internal LM) replaceable. When encountering a domain shift, the
internal LM can be directly replaced during inference by a target-domain LM,
without re-training or using domain-specific paired speech-text data.
Experiments for E2E ASR models trained on the LibriSpeech-100h corpus showed
that the proposed decoupled structure gave 15.1% and 17.2% relative word error
rate reductions on the TED-LIUM 2 and AESRC2020 corpora while still maintaining
performance on intra-domain data.
Related papers
- Mask The Bias: Improving Domain-Adaptive Generalization of CTC-based ASR
with Internal Language Model Estimation [14.840612036671734]
Internal language model estimation (ILME) has been proposed to mitigate this bias for autoregressive models.
We propose a novel ILME technique for CTC-based ASR models.
Our method iteratively masks the audio timesteps to estimate a pseudo log-likelihood of the internal LM.
arXiv Detail & Related papers (2023-05-05T20:35:42Z) - Non-Parametric Domain Adaptation for End-to-End Speech Translation [72.37869362559212]
End-to-End Speech Translation (E2E-ST) has received increasing attention due to the potential of its less error propagation, lower latency, and fewer parameters.
We propose a novel non-parametric method that leverages domain-specific text translation corpus to achieve domain adaptation for the E2E-ST system.
arXiv Detail & Related papers (2022-05-23T11:41:02Z) - A Likelihood Ratio based Domain Adaptation Method for E2E Models [10.510472957585646]
End-to-end (E2E) automatic speech recognition models like Recurrent Neural Networks Transducer (RNN-T) are becoming a popular choice for streaming ASR applications like voice assistants.
While E2E models are very effective at learning representation of the training data they are trained on, their accuracy on unseen domains remains a challenging problem.
In this work, we explore a contextual biasing approach using likelihood-ratio that leverages text data sources to adapt RNN-T model to new domains and entities.
arXiv Detail & Related papers (2022-01-10T21:22:39Z) - Internal Language Model Adaptation with Text-Only Data for End-to-End
Speech Recognition [80.32546870220979]
We propose an internal LM adaptation (ILMA) of the E2E model using text-only data.
ILMA enables a fast text-only adaptation of the E2E model without increasing the run-time computational cost.
Experimented with 30K-hour trained transformer transducer models, ILMA achieves up to 34.9% relative word error rate reduction.
arXiv Detail & Related papers (2021-10-06T23:03:29Z) - Stagewise Unsupervised Domain Adaptation with Adversarial Self-Training
for Road Segmentation of Remote Sensing Images [93.50240389540252]
Road segmentation from remote sensing images is a challenging task with wide ranges of application potentials.
We propose a novel stagewise domain adaptation model called RoadDA to address the domain shift (DS) issue in this field.
Experiment results on two benchmarks demonstrate that RoadDA can efficiently reduce the domain gap and outperforms state-of-the-art methods.
arXiv Detail & Related papers (2021-08-28T09:29:14Z) - Internal Language Model Training for Domain-Adaptive End-to-End Speech
Recognition [83.739317674302]
Internal language model estimation (ILME) method can be used to improve integration between external language models and automatic speech recognition systems.
We propose an internal LM training (ILMT) method to minimize an additional internal LM loss.
ILMT encourages the E2E model to form a standalone LM inside its existing components, without sacrificing ASR accuracy.
arXiv Detail & Related papers (2021-02-02T08:15:02Z) - Internal Language Model Estimation for Domain-Adaptive End-to-End Speech
Recognition [56.27081731553829]
Internal language models (LM) integration is a challenging task for end-to-end (E2E) automatic speech recognition.
We propose an internal LM estimation (ILME) method to facilitate a more effective integration of the external LM with all pre-existing E2E models.
ILME can alleviate the domain mismatch between training and testing, or improve the multi-domain E2E ASR.
arXiv Detail & Related papers (2020-11-03T20:11:04Z) - Iterative Domain-Repaired Back-Translation [50.32925322697343]
In this paper, we focus on the domain-specific translation with low resources, where in-domain parallel corpora are scarce or nonexistent.
We propose a novel iterative domain-repaired back-translation framework, which introduces the Domain-Repair model to refine translations in synthetic bilingual data.
Experiments on adapting NMT models between specific domains and from the general domain to specific domains demonstrate the effectiveness of our proposed approach.
arXiv Detail & Related papers (2020-10-06T04:38:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.