Impact of Dataset on Acoustic Models for Automatic Speech Recognition
- URL: http://arxiv.org/abs/2203.13590v1
- Date: Fri, 25 Mar 2022 11:41:49 GMT
- Title: Impact of Dataset on Acoustic Models for Automatic Speech Recognition
- Authors: Siddhesh Singh
- Abstract summary: In Automatic Speech Recognition, GMM-HMM had been widely used for acoustic modelling.
The GMM models are widely used to create the alignments of the training data for the hybrid deep neural network model.
This work aims to investigate the impact of dataset size variations on the performance of various GMM-HMM Acoustic Models.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In Automatic Speech Recognition, GMM-HMM had been widely used for acoustic
modelling. With the current advancement of deep learning, the Gaussian Mixture
Model (GMM) from acoustic models has been replaced with Deep Neural Network,
namely DNN-HMM Acoustic Models. The GMM models are widely used to create the
alignments of the training data for the hybrid deep neural network model, thus
making it an important task to create accurate alignments. Many factors such as
training dataset size, training data augmentation, model hyperparameters, etc.,
affect the model learning. Traditionally in machine learning, larger datasets
tend to have better performance, while smaller datasets tend to trigger
over-fitting. The collection of speech data and their accurate transcriptions
is a significant challenge that varies over different languages, and in most
cases, it might be limited to big organizations. Moreover, in the case of
available large datasets, training a model using such data requires additional
time and computing resources, which may not be available. While the data about
the accuracy of state-of-the-art ASR models on open-source datasets are
published, the study about the impact of the size of a dataset on acoustic
models is not readily available. This work aims to investigate the impact of
dataset size variations on the performance of various GMM-HMM Acoustic Models
and their respective computational costs.
Related papers
- Scaling Retrieval-Based Language Models with a Trillion-Token Datastore [85.4310806466002]
We find that increasing the size of the datastore used by a retrieval-based LM monotonically improves language modeling and several downstream tasks without obvious saturation.
By plotting compute-optimal scaling curves with varied datastore, model, and pretraining data sizes, we show that using larger datastores can significantly improve model performance for the same training compute budget.
arXiv Detail & Related papers (2024-07-09T08:27:27Z) - Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark [65.79402756995084]
Real Acoustic Fields (RAF) is a new dataset that captures real acoustic room data from multiple modalities.
RAF is the first dataset to provide densely captured room acoustic data.
arXiv Detail & Related papers (2024-03-27T17:59:56Z) - A Systematic Approach to Robustness Modelling for Deep Convolutional
Neural Networks [0.294944680995069]
Recent work raises questions about the ability for even larger models to generalize to data outside of the controlled train and test sets.
We provide a method that uses induced failures to model the probability of failure as a function of time.
We examine the various trade-offs between cost, robustness, latency, and reliability to find that larger models do not significantly aid in adversarial robustness.
arXiv Detail & Related papers (2024-01-24T19:12:37Z) - MADS: Modulated Auto-Decoding SIREN for time series imputation [9.673093148930874]
We propose MADS, a novel auto-decoding framework for time series imputation, built upon implicit neural representations.
We evaluate our model on two real-world datasets, and show that it outperforms state-of-the-art methods for time series imputation.
arXiv Detail & Related papers (2023-07-03T09:08:47Z) - To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis [50.31589712761807]
Large language models (LLMs) are notoriously token-hungry during pre-training, and high-quality text data on the web is approaching its scaling limit for LLMs.
We investigate the consequences of repeating pre-training data, revealing that the model is susceptible to overfitting.
Second, we examine the key factors contributing to multi-epoch degradation, finding that significant factors include dataset size, model parameters, and training objectives.
arXiv Detail & Related papers (2023-05-22T17:02:15Z) - CHALLENGER: Training with Attribution Maps [63.736435657236505]
We show that utilizing attribution maps for training neural networks can improve regularization of models and thus increase performance.
In particular, we show that our generic domain-independent approach yields state-of-the-art results in vision, natural language processing and on time series tasks.
arXiv Detail & Related papers (2022-05-30T13:34:46Z) - Bayesian Active Learning for Discrete Latent Variable Models [19.852463786440122]
Active learning seeks to reduce the amount of data required to fit the parameters of a model.
latent variable models play a vital role in neuroscience, psychology, and a variety of other engineering and scientific disciplines.
arXiv Detail & Related papers (2022-02-27T19:07:12Z) - BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning
for Automatic Speech Recognition [126.5605160882849]
We find that the combination of pre-training, self-training and scaling up model size greatly increases data efficiency.
We report on the universal benefits gained from using big pre-trained and self-trained models for a large set of downstream tasks.
arXiv Detail & Related papers (2021-09-27T17:59:19Z) - NODE-GAM: Neural Generalized Additive Model for Interpretable Deep
Learning [16.15084484295732]
Generalized Additive Models (GAMs) have a long history of use in high-risk domains.
We propose a neural GAM (NODE-GAM) and neural GA$2$M (NODE-GA$2$M)
We show that our proposed models have comparable accuracy to other non-interpretable models, and outperform other GAMs on large datasets.
arXiv Detail & Related papers (2021-06-03T06:20:18Z) - Comparing Test Sets with Item Response Theory [53.755064720563]
We evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples.
We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models.
We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.
arXiv Detail & Related papers (2021-06-01T22:33:53Z) - Forecasting Industrial Aging Processes with Machine Learning Methods [0.0]
We evaluate a wider range of data-driven models, comparing some traditional stateless models to more complex recurrent neural networks.
Our results show that recurrent models produce near perfect predictions when trained on larger datasets.
arXiv Detail & Related papers (2020-02-05T13:06:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.