Adaptation Algorithms for Neural Network-Based Speech Recognition: An
Overview
- URL: http://arxiv.org/abs/2008.06580v2
- Date: Sun, 28 Feb 2021 19:41:45 GMT
- Title: Adaptation Algorithms for Neural Network-Based Speech Recognition: An
Overview
- Authors: Peter Bell, Joachim Fainberg, Ondrej Klejch, Jinyu Li, Steve Renals,
Pawel Swietojanski
- Abstract summary: We present a structured overview of adaptation algorithms for neural network-based speech recognition.
The overview characterizes adaptation algorithms as based on embeddings, model parameter adaptation, or data augmentation.
We present a meta-analysis of the performance of speech recognition adaptation algorithms, based on relative error rate reductions as reported in the literature.
- Score: 43.12352697785169
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a structured overview of adaptation algorithms for neural
network-based speech recognition, considering both hybrid hidden Markov model /
neural network systems and end-to-end neural network systems, with a focus on
speaker adaptation, domain adaptation, and accent adaptation. The overview
characterizes adaptation algorithms as based on embeddings, model parameter
adaptation, or data augmentation. We present a meta-analysis of the performance
of speech recognition adaptation algorithms, based on relative error rate
reductions as reported in the literature.
Related papers
- Neural Speech and Audio Coding [19.437080345021105]
The paper explores the integration of model-based and data-driven approaches within the realm of neural speech and audio coding systems.
It introduces a neural network-based signal enhancer designed to post-process existing codecs' output.
The paper examines the use of psychoacoustically calibrated loss functions to train end-to-end neural audio codecs.
arXiv Detail & Related papers (2024-08-13T15:13:21Z) - Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z) - Reparameterization through Spatial Gradient Scaling [69.27487006953852]
Reparameterization aims to improve the generalization of deep neural networks by transforming convolutional layers into equivalent multi-branched structures during training.
We present a novel spatial gradient scaling method to redistribute learning focus among weights in convolutional networks.
arXiv Detail & Related papers (2023-03-05T17:57:33Z) - Training neural networks with structured noise improves classification and generalization [0.0]
We show how adding structure to noisy training data can substantially improve the algorithm performance.
We also prove that the so-called Hebbian Unlearning rule coincides with the training-with-noise algorithm when noise is maximal.
arXiv Detail & Related papers (2023-02-26T22:10:23Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Full-Reference Speech Quality Estimation with Attentional Siamese Neural
Networks [0.0]
We present a full-reference speech quality prediction model with a deep learning approach.
The model determines a feature representation of the reference and the degraded signal through a siamese recurrent convolutional network.
The resulting features are then used to align the signals with an attention mechanism and are finally combined to estimate the overall speech quality.
arXiv Detail & Related papers (2021-05-03T12:38:25Z) - Sparse Mixture of Local Experts for Efficient Speech Enhancement [19.645016575334786]
We investigate a deep learning approach for speech denoising through an efficient ensemble of specialist neural networks.
By splitting up the speech denoising task into non-overlapping subproblems, we are able to improve denoising performance while also reducing computational complexity.
Our findings demonstrate that a fine-tuned ensemble network is able to exceed the speech denoising capabilities of a generalist network.
arXiv Detail & Related papers (2020-05-16T23:23:22Z) - AutoSpeech: Neural Architecture Search for Speaker Recognition [108.69505815793028]
We propose the first neural architecture search approach approach for the speaker recognition tasks, named as AutoSpeech.
Our algorithm first identifies the optimal operation combination in a neural cell and then derives a CNN model by stacking the neural cell for multiple times.
Results demonstrate that the derived CNN architectures significantly outperform current speaker recognition systems based on VGG-M, ResNet-18, and ResNet-34 back-bones, while enjoying lower model complexity.
arXiv Detail & Related papers (2020-05-07T02:53:47Z) - Parallelization Techniques for Verifying Neural Networks [52.917845265248744]
We introduce an algorithm based on the verification problem in an iterative manner and explore two partitioning strategies.
We also introduce a highly parallelizable pre-processing algorithm that uses the neuron activation phases to simplify the neural network verification problems.
arXiv Detail & Related papers (2020-04-17T20:21:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.