Black-box Adaptation of ASR for Accented Speech
- URL: http://arxiv.org/abs/2006.13519v1
- Date: Wed, 24 Jun 2020 07:07:49 GMT
- Title: Black-box Adaptation of ASR for Accented Speech
- Authors: Kartik Khandelwal, Preethi Jyothi, Abhijeet Awasthi, Sunita Sarawagi
- Abstract summary: We introduce the problem of adapting a black-box, cloud-based ASR system to speech from a target accent.
We propose a novel coupling of an open-source accent-tuned local model with the black-box service.
Our fine-grained merging algorithm is better at fixing accent errors than existing word-level combination strategies.
- Score: 52.63060669715216
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce the problem of adapting a black-box, cloud-based ASR system to
speech from a target accent. While leading online ASR services obtain
impressive performance on main-stream accents, they perform poorly on
sub-populations - we observed that the word error rate (WER) achieved by
Google's ASR API on Indian accents is almost twice the WER on US accents.
Existing adaptation methods either require access to model parameters or
overlay an error-correcting module on output transcripts. We highlight the need
for correlating outputs with the original speech to fix accent errors.
Accordingly, we propose a novel coupling of an open-source accent-tuned local
model with the black-box service where the output from the service guides
frame-level inference in the local model. Our fine-grained merging algorithm is
better at fixing accent errors than existing word-level combination strategies.
Experiments on Indian and Australian accents with three leading ASR models as
service, show that we achieve as much as 28% relative reduction in WER over
both the local and service models.
Related papers
- Accent conversion using discrete units with parallel data synthesized from controllable accented TTS [56.18382038512251]
The goal of accent conversion (AC) is to convert speech accents while preserving content and speaker identity.
Previous methods either required reference utterances during inference, did not preserve speaker identity well, or used one-to-one systems that could only be trained for each non-native accent.
This paper presents a promising AC model that can convert many accents into native to overcome these issues.
arXiv Detail & Related papers (2024-09-30T19:52:10Z) - Convert and Speak: Zero-shot Accent Conversion with Minimum Supervision [16.21891840664049]
Low resource of parallel data is the key challenge of accent conversion problem.
We propose a two-stage generative framework "convert-and-speak" in which the conversion is only operated on the semantic token level.
The framework achieves state-of-the-art performance in accent similarity, speech quality, and speaker maintenance with only 15 minutes of weakly parallel data.
arXiv Detail & Related papers (2024-08-19T15:33:59Z) - Improving Self-supervised Pre-training using Accent-Specific Codebooks [48.409296549372414]
accent-aware adaptation technique for self-supervised learning.
On the Mozilla Common Voice dataset, our proposed approach outperforms all other accent-adaptation approaches.
arXiv Detail & Related papers (2024-07-04T08:33:52Z) - Accented Speech Recognition With Accent-specific Codebooks [53.288874858671576]
Speech accents pose a significant challenge to state-of-the-art automatic speech recognition (ASR) systems.
Degradation in performance across underrepresented accents is a severe deterrent to the inclusive adoption of ASR.
We propose a novel accent adaptation approach for end-to-end ASR systems using cross-attention with a trainable set of codebooks.
arXiv Detail & Related papers (2023-10-24T16:10:58Z) - Don't Stop Self-Supervision: Accent Adaptation of Speech Representations
via Residual Adapters [14.645374377673148]
Speech representations learned in a self-supervised fashion from massive unlabeled speech corpora have been adapted successfully toward several downstream tasks.
We propose and investigate self-supervised adaptation of speech representations to such populations in a parameter-efficient way via training accent-specific adapters.
We obtain strong word error rate reductions (WERR) over HuBERT-large for all 4 accents, with a mean WERR of 22.7% with accent-specific adapters and a mean WERR of 25.1% if the entire encoder is accent-adapted.
arXiv Detail & Related papers (2023-07-02T02:21:29Z) - Synthetic Cross-accent Data Augmentation for Automatic Speech
Recognition [18.154258453839066]
We improve an accent-conversion model (ACM) which transforms native US-English speech into accented pronunciation.
We include phonetic knowledge in the ACM training to provide accurate feedback about how well certain pronunciation patterns were recovered in the synthesized waveform.
We evaluate our approach on native and non-native English datasets and found that synthetically accented data helped the ASR to better understand speech from seen accents.
arXiv Detail & Related papers (2023-03-01T20:05:19Z) - Sequence-level self-learning with multiple hypotheses [53.04725240411895]
We develop new self-learning techniques with an attention-based sequence-to-sequence (seq2seq) model for automatic speech recognition (ASR)
In contrast to conventional unsupervised learning approaches, we adopt the emphmulti-task learning (MTL) framework.
Our experiment results show that our method can reduce the WER on the British speech data from 14.55% to 10.36% compared to the baseline model trained with the US English data only.
arXiv Detail & Related papers (2021-12-10T20:47:58Z) - Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU)
We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.