Automatic Learning of Subword Dependent Model Scales
- URL: http://arxiv.org/abs/2110.09324v1
- Date: Mon, 18 Oct 2021 13:48:28 GMT
- Title: Automatic Learning of Subword Dependent Model Scales
- Authors: Felix Meyer and Wilfried Michel and Mohammad Zeineldeen and Ralf
Schl\"uter and Hermann Ney
- Abstract summary: We show that the model scales for a combination of attention encoder-decoder acoustic model and language model can be learned as effectively as with manual tuning.
We extend this approach to subword dependent model scales which could not be tuned manually which leads to 7% improvement on LBS and 3% on SWB.
- Score: 50.105894487730545
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To improve the performance of state-of-the-art automatic speech recognition
systems it is common practice to include external knowledge sources such as
language models or prior corrections. This is usually done via log-linear model
combination using separate scaling parameters for each model. Typically these
parameters are manually optimized on some held-out data.
In this work we propose to optimize these scaling parameters via automatic
differentiation and stochastic gradient decent similar to the neural network
model parameters. We show on the LibriSpeech (LBS) and Switchboard (SWB)
corpora that the model scales for a combination of attentionbased
encoder-decoder acoustic model and language model can be learned as effectively
as with manual tuning. We further extend this approach to subword dependent
model scales which could not be tuned manually which leads to 7% improvement on
LBS and 3% on SWB. We also show that joint training of scales and model
parameters is possible and gives additional 6% improvement on LBS.
Related papers
- ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets [106.7760874400261]
This paper presents ML-SUPERB2.0, which is a new benchmark for evaluating pre-trained SSL and supervised speech models.
We find performance improvements over the setup of ML-SUPERB, but performance depends on the downstream model design.
Also, we find large performance differences between languages and datasets, suggesting the need for more targeted approaches.
arXiv Detail & Related papers (2024-06-12T21:01:26Z) - LLM-based speaker diarization correction: A generalizable approach [0.0]
We investigate the use of large language models (LLMs) for diarization correction as a post-processing step.
The ability of the models to improve diarization accuracy in a holdout dataset from the Fisher corpus as well as an independent dataset was measured.
arXiv Detail & Related papers (2024-06-07T13:33:22Z) - Observational Scaling Laws and the Predictability of Language Model Performance [51.2336010244645]
We propose an observational approach that bypasses model training and instead builds scaling laws from 100 publically available models.
We show that several emergent phenomena follow a smooth, sigmoidal behavior and are predictable from small models.
We show how to predict the impact of post-training interventions like Chain-of-Thought and Self-Consistency as language model capabilities continue to improve.
arXiv Detail & Related papers (2024-05-17T17:49:44Z) - Automating Model Comparison in Factor Graphs [3.119859292303397]
This paper efficiently automates Bayesian model averaging, selection, and combination by message passing on a Forney-style factor graph with a custom mixture node.
This approach shortens the model design cycle and allows for the straightforward extension to hierarchical and temporal model priors to accommodate for modeling complicated time-varying processes.
arXiv Detail & Related papers (2023-06-09T15:33:30Z) - Scaling Pre-trained Language Models to Deeper via Parameter-efficient
Architecture [68.13678918660872]
We design a more capable parameter-sharing architecture based on matrix product operator (MPO)
MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts.
Our architecture shares the central tensor across all layers for reducing the model size.
arXiv Detail & Related papers (2023-03-27T02:34:09Z) - Continual Learning for On-Device Speech Recognition using Disentangled
Conformers [54.32320258055716]
We introduce a continual learning benchmark for speaker-specific domain adaptation derived from LibriVox audiobooks.
We propose a novel compute-efficient continual learning algorithm called DisentangledCL.
Our experiments show that the DisConformer models significantly outperform baselines on general ASR.
arXiv Detail & Related papers (2022-12-02T18:58:51Z) - Investigation of Ensemble features of Self-Supervised Pretrained Models
for Automatic Speech Recognition [0.3007949058551534]
Self-supervised learning (SSL) based models have been shown to generate powerful representations that can be used to improve the performance of downstream speech tasks.
This paper proposes using an ensemble of such SSL representations and models, which exploits the complementary nature of the features extracted by the various pretrained models.
arXiv Detail & Related papers (2022-06-11T12:43:00Z) - Dynamically-Scaled Deep Canonical Correlation Analysis [77.34726150561087]
Canonical Correlation Analysis (CCA) is a method for feature extraction of two views by finding maximally correlated linear projections of them.
We introduce a novel dynamic scaling method for training an input-dependent canonical correlation model.
arXiv Detail & Related papers (2022-03-23T12:52:49Z) - The Power of Scale for Parameter-Efficient Prompt Tuning [4.481348281462904]
"prompt tuning" is a simple mechanism for learning "soft prompts" to condition frozen language models to perform specific downstream tasks.
Our end-to-end learned approach outperforms GPT-3's "few-shot" learning by a large margin.
arXiv Detail & Related papers (2021-04-18T03:19:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.