Multi-Accent Adaptation based on Gate Mechanism
- URL: http://arxiv.org/abs/2011.02774v1
- Date: Thu, 5 Nov 2020 11:58:36 GMT
- Title: Multi-Accent Adaptation based on Gate Mechanism
- Authors: Han Zhu, Li Wang, Pengyuan Zhang, Yonghong Yan
- Abstract summary: We propose using accent-specific top layer with gate mechanism (AST-G) to realize multi-accent adaptation.
In real-world applications, we can't obtain the accent category label for inference in advance.
As the accent label prediction could be inaccurate, it performs worse than the accent-specific adaptation.
- Score: 35.76889921807408
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When only a limited amount of accented speech data is available, to promote
multi-accent speech recognition performance, the conventional approach is
accent-specific adaptation, which adapts the baseline model to multiple target
accents independently. To simplify the adaptation procedure, we explore
adapting the baseline model to multiple target accents simultaneously with
multi-accent mixed data. Thus, we propose using accent-specific top layer with
gate mechanism (AST-G) to realize multi-accent adaptation. Compared with the
baseline model and accent-specific adaptation, AST-G achieves 9.8% and 1.9%
average relative WER reduction respectively. However, in real-world
applications, we can't obtain the accent category label for inference in
advance. Therefore, we apply using an accent classifier to predict the accent
label. To jointly train the acoustic model and the accent classifier, we
propose the multi-task learning with gate mechanism (MTL-G). As the accent
label prediction could be inaccurate, it performs worse than the
accent-specific adaptation. Yet, in comparison with the baseline model, MTL-G
achieves 5.1% average relative WER reduction.
Related papers
- Accent conversion using discrete units with parallel data synthesized from controllable accented TTS [56.18382038512251]
The goal of accent conversion (AC) is to convert speech accents while preserving content and speaker identity.
Previous methods either required reference utterances during inference, did not preserve speaker identity well, or used one-to-one systems that could only be trained for each non-native accent.
This paper presents a promising AC model that can convert many accents into native to overcome these issues.
arXiv Detail & Related papers (2024-09-30T19:52:10Z) - GE2E-AC: Generalized End-to-End Loss Training for Accent Classification [13.266765406714942]
We propose a GE2E-AC, in which we train a model to extract accent embedding or AE of an input utterance.
We experimentally show the effectiveness of the proposed GE2E-AC, compared to the baseline model trained with the conventional cross-entropy-based loss.
arXiv Detail & Related papers (2024-07-19T04:44:16Z) - Improving Self-supervised Pre-training using Accent-Specific Codebooks [48.409296549372414]
accent-aware adaptation technique for self-supervised learning.
On the Mozilla Common Voice dataset, our proposed approach outperforms all other accent-adaptation approaches.
arXiv Detail & Related papers (2024-07-04T08:33:52Z) - Accented Speech Recognition With Accent-specific Codebooks [53.288874858671576]
Speech accents pose a significant challenge to state-of-the-art automatic speech recognition (ASR) systems.
Degradation in performance across underrepresented accents is a severe deterrent to the inclusive adoption of ASR.
We propose a novel accent adaptation approach for end-to-end ASR systems using cross-attention with a trainable set of codebooks.
arXiv Detail & Related papers (2023-10-24T16:10:58Z) - Don't Stop Self-Supervision: Accent Adaptation of Speech Representations
via Residual Adapters [14.645374377673148]
Speech representations learned in a self-supervised fashion from massive unlabeled speech corpora have been adapted successfully toward several downstream tasks.
We propose and investigate self-supervised adaptation of speech representations to such populations in a parameter-efficient way via training accent-specific adapters.
We obtain strong word error rate reductions (WERR) over HuBERT-large for all 4 accents, with a mean WERR of 22.7% with accent-specific adapters and a mean WERR of 25.1% if the entire encoder is accent-adapted.
arXiv Detail & Related papers (2023-07-02T02:21:29Z) - Modelling low-resource accents without accent-specific TTS frontend [4.185844990558149]
This work focuses on modelling a speaker's accent that does not have a dedicated text-to-speech (TTS)
We propose an approach whereby we first augment the target accent data to sound like the donor voice via voice conversion.
We then train a multi-speaker multi-accent TTS model on the combination of recordings and synthetic data, to generate the target accent.
arXiv Detail & Related papers (2023-01-11T18:00:29Z) - Low-resource Accent Classification in Geographically-proximate Settings:
A Forensic and Sociophonetics Perspective [8.002498051045228]
Accented speech recognition and accent classification are relatively under-explored research areas in speech technology.
Recent deep learning-based methods and Transformer-based pretrained models have achieved superb performances in both areas.
In this paper, we explored three main accent modelling methods combined with two different classifiers based on 105 speaker recordings retrieved from five urban varieties in Northern England.
arXiv Detail & Related papers (2022-06-26T01:25:17Z) - A Highly Adaptive Acoustic Model for Accurate Multi-Dialect Speech
Recognition [80.87085897419982]
We propose a novel acoustic modeling technique for accurate multi-dialect speech recognition with a single AM.
Our proposed AM is dynamically adapted based on both dialect information and its internal representation, which results in a highly adaptive AM for handling multiple dialects simultaneously.
The experimental results on large scale speech datasets show that the proposed AM outperforms all the previous ones, reducing word error rates (WERs) by 8.11% relative compared to a single all-dialects AM and by 7.31% relative compared to dialect-specific AMs.
arXiv Detail & Related papers (2022-05-06T06:07:09Z) - Black-box Adaptation of ASR for Accented Speech [52.63060669715216]
We introduce the problem of adapting a black-box, cloud-based ASR system to speech from a target accent.
We propose a novel coupling of an open-source accent-tuned local model with the black-box service.
Our fine-grained merging algorithm is better at fixing accent errors than existing word-level combination strategies.
arXiv Detail & Related papers (2020-06-24T07:07:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.