Deep Discriminative Feature Learning for Accent Recognition
- URL: http://arxiv.org/abs/2011.12461v4
- Date: Wed, 25 Aug 2021 09:18:32 GMT
- Title: Deep Discriminative Feature Learning for Accent Recognition
- Authors: Wei Wang, Chao Zhang, Xiaopei Wu
- Abstract summary: We adopt Convolutional Recurrent Neural Network as front-end encoder and integrate local features using Recurrent Neural Network to make an utterance-level accent representation.
We show that our proposed network with discriminative training method is significantly ahead of the baseline system on the accent classification track in the Accented English Speech Recognition Challenge 2020.
- Score: 14.024346215923972
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accent recognition with deep learning framework is a similar work to deep
speaker identification, they're both expected to give the input speech an
identifiable representation.
Compared with the individual-level features learned by speaker identification
network, the deep accent recognition work throws a more challenging point that
forging group-level accent features for speakers.
In this paper, we borrow and improve the deep speaker identification
framework to recognize accents, in detail, we adopt Convolutional Recurrent
Neural Network as front-end encoder and integrate local features using
Recurrent Neural Network to make an utterance-level accent representation.
Novelly, to address overfitting, we simply add Connectionist Temporal
Classification based speech recognition auxiliary task during training, and for
ambiguous accent discrimination, we introduce some powerful discriminative loss
functions in face recognition works to enhance the discriminative power of
accent features.
We show that our proposed network with discriminative training method
(without data-augment) is significantly ahead of the baseline system on the
accent classification track in the Accented English Speech Recognition
Challenge 2020, where the loss function Circle-Loss has achieved the best
discriminative optimization for accent representation.
Related papers
- Accent conversion using discrete units with parallel data synthesized from controllable accented TTS [56.18382038512251]
The goal of accent conversion (AC) is to convert speech accents while preserving content and speaker identity.
Previous methods either required reference utterances during inference, did not preserve speaker identity well, or used one-to-one systems that could only be trained for each non-native accent.
This paper presents a promising AC model that can convert many accents into native to overcome these issues.
arXiv Detail & Related papers (2024-09-30T19:52:10Z) - Explaining Spectrograms in Machine Learning: A Study on Neural Networks for Speech Classification [2.4472308031704073]
This study investigates discriminative patterns learned by neural networks for accurate speech classification.
By examining the activations and features of neural networks for vowel classification, we gain insights into what the networks "see" in spectrograms.
arXiv Detail & Related papers (2024-07-10T07:37:18Z) - Accented Speech Recognition With Accent-specific Codebooks [53.288874858671576]
Speech accents pose a significant challenge to state-of-the-art automatic speech recognition (ASR) systems.
Degradation in performance across underrepresented accents is a severe deterrent to the inclusive adoption of ASR.
We propose a novel accent adaptation approach for end-to-end ASR systems using cross-attention with a trainable set of codebooks.
arXiv Detail & Related papers (2023-10-24T16:10:58Z) - Deep Neural Convolutive Matrix Factorization for Articulatory
Representation Decomposition [48.56414496900755]
This work uses a neural implementation of convolutive sparse matrix factorization to decompose the articulatory data into interpretable gestures and gestural scores.
Phoneme recognition experiments were additionally performed to show that gestural scores indeed code phonological information successfully.
arXiv Detail & Related papers (2022-04-01T14:25:19Z) - Analysis of French Phonetic Idiosyncrasies for Accent Recognition [0.8602553195689513]
Differences in pronunciation, in accent and intonation of speech in general, create one of the most common problems of speech recognition.
We use traditional machine learning techniques and convolutional neural networks, and show that the classical techniques are not sufficiently efficient to solve this problem.
In this paper, we focus our attention on the French accent. We also identify its limitation by understanding the impact of French idiosyncrasies on its spectrograms.
arXiv Detail & Related papers (2021-10-18T10:50:50Z) - Accented Speech Recognition Inspired by Human Perception [0.0]
This paper explores methods that are inspired by human perception to evaluate possible performance improvements for recognition of accented speech.
We explore four methodologies: pre-exposure to multiple accents, grapheme and phoneme-based pronunciations, dropout, and the identification of the layers in the neural network that can specifically be associated with accent modeling.
Our results indicate that methods based on human perception are promising in reducing WER and understanding how accented speech is modeled in neural networks for novel accents.
arXiv Detail & Related papers (2021-04-09T22:35:09Z) - Speaker De-identification System using Autoencoders and Adversarial
Training [58.720142291102135]
We propose a speaker de-identification system based on adversarial training and autoencoders.
Experimental results show that combining adversarial learning and autoencoders increase the equal error rate of a speaker verification system.
arXiv Detail & Related papers (2020-11-09T19:22:05Z) - AccentDB: A Database of Non-Native English Accents to Assist Neural
Speech Recognition [3.028098724882708]
We first spell out the key requirements for creating a well-curated database of speech samples in non-native accents for training and testing robust ASR systems.
We then introduce AccentDB, one such database that contains samples of 4 Indian-English accents collected by us.
We present several accent classification models and evaluate them thoroughly against human-labelled accent classes.
arXiv Detail & Related papers (2020-05-16T12:38:30Z) - Speech Enhancement using Self-Adaptation and Multi-Head Self-Attention [70.82604384963679]
This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features.
We extract a speaker representation used for adaptation directly from the test utterance.
arXiv Detail & Related papers (2020-02-14T05:05:36Z) - Improving speaker discrimination of target speech extraction with
time-domain SpeakerBeam [100.95498268200777]
SpeakerBeam exploits an adaptation utterance of the target speaker to extract his/her voice characteristics.
SpeakerBeam sometimes fails when speakers have similar voice characteristics, such as in same-gender mixtures.
We show experimentally that these strategies greatly improve speech extraction performance, especially for same-gender mixtures.
arXiv Detail & Related papers (2020-01-23T05:36:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.