Impact of Dataset on Acoustic Models for Automatic Speech Recognition
- URL: http://arxiv.org/abs/2203.13590v1
- Date: Fri, 25 Mar 2022 11:41:49 GMT
- Title: Impact of Dataset on Acoustic Models for Automatic Speech Recognition
- Authors: Siddhesh Singh
- Abstract summary: In Automatic Speech Recognition, GMM-HMM had been widely used for acoustic modelling.
The GMM models are widely used to create the alignments of the training data for the hybrid deep neural network model.
This work aims to investigate the impact of dataset size variations on the performance of various GMM-HMM Acoustic Models.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In Automatic Speech Recognition, GMM-HMM had been widely used for acoustic
modelling. With the current advancement of deep learning, the Gaussian Mixture
Model (GMM) from acoustic models has been replaced with Deep Neural Network,
namely DNN-HMM Acoustic Models. The GMM models are widely used to create the
alignments of the training data for the hybrid deep neural network model, thus
making it an important task to create accurate alignments. Many factors such as
training dataset size, training data augmentation, model hyperparameters, etc.,
affect the model learning. Traditionally in machine learning, larger datasets
tend to have better performance, while smaller datasets tend to trigger
over-fitting. The collection of speech data and their accurate transcriptions
is a significant challenge that varies over different languages, and in most
cases, it might be limited to big organizations. Moreover, in the case of
available large datasets, training a model using such data requires additional
time and computing resources, which may not be available. While the data about
the accuracy of state-of-the-art ASR models on open-source datasets are
published, the study about the impact of the size of a dataset on acoustic
models is not readily available. This work aims to investigate the impact of
dataset size variations on the performance of various GMM-HMM Acoustic Models
and their respective computational costs.
Related papers
- CoDi: Conversational Distillation for Grounded Question Answering [10.265241619616676]
We introduce a novel data distillation framework named CoDi.
CoDi allows us to synthesize large-scale, assistant-style datasets in a steerable and diverse manner.
We show that SLMs trained with CoDi-synthesized data achieve performance comparable to models trained on human-annotated data in standard metrics.
arXiv Detail & Related papers (2024-08-20T22:35:47Z) - Scaling Retrieval-Based Language Models with a Trillion-Token Datastore [85.4310806466002]
We find that increasing the size of the datastore used by a retrieval-based LM monotonically improves language modeling and several downstream tasks without obvious saturation.
By plotting compute-optimal scaling curves with varied datastore, model, and pretraining data sizes, we show that using larger datastores can significantly improve model performance for the same training compute budget.
arXiv Detail & Related papers (2024-07-09T08:27:27Z) - Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - Scaling Laws Do Not Scale [54.72120385955072]
Recent work has argued that as the size of a dataset increases, the performance of a model trained on that dataset will increase.
We argue that this scaling law relationship depends on metrics used to measure performance that may not correspond with how different groups of people perceive the quality of models' output.
Different communities may also have values in tension with each other, leading to difficult, potentially irreconcilable choices about metrics used for model evaluations.
arXiv Detail & Related papers (2023-07-05T15:32:21Z) - MADS: Modulated Auto-Decoding SIREN for time series imputation [9.673093148930874]
We propose MADS, a novel auto-decoding framework for time series imputation, built upon implicit neural representations.
We evaluate our model on two real-world datasets, and show that it outperforms state-of-the-art methods for time series imputation.
arXiv Detail & Related papers (2023-07-03T09:08:47Z) - To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis [50.31589712761807]
Large language models (LLMs) are notoriously token-hungry during pre-training, and high-quality text data on the web is approaching its scaling limit for LLMs.
We investigate the consequences of repeating pre-training data, revealing that the model is susceptible to overfitting.
Second, we examine the key factors contributing to multi-epoch degradation, finding that significant factors include dataset size, model parameters, and training objectives.
arXiv Detail & Related papers (2023-05-22T17:02:15Z) - CHALLENGER: Training with Attribution Maps [63.736435657236505]
We show that utilizing attribution maps for training neural networks can improve regularization of models and thus increase performance.
In particular, we show that our generic domain-independent approach yields state-of-the-art results in vision, natural language processing and on time series tasks.
arXiv Detail & Related papers (2022-05-30T13:34:46Z) - Bayesian Active Learning for Discrete Latent Variable Models [19.852463786440122]
Active learning seeks to reduce the amount of data required to fit the parameters of a model.
latent variable models play a vital role in neuroscience, psychology, and a variety of other engineering and scientific disciplines.
arXiv Detail & Related papers (2022-02-27T19:07:12Z) - NODE-GAM: Neural Generalized Additive Model for Interpretable Deep
Learning [16.15084484295732]
Generalized Additive Models (GAMs) have a long history of use in high-risk domains.
We propose a neural GAM (NODE-GAM) and neural GA$2$M (NODE-GA$2$M)
We show that our proposed models have comparable accuracy to other non-interpretable models, and outperform other GAMs on large datasets.
arXiv Detail & Related papers (2021-06-03T06:20:18Z) - Comparing Test Sets with Item Response Theory [53.755064720563]
We evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples.
We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models.
We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.
arXiv Detail & Related papers (2021-06-01T22:33:53Z) - Forecasting Industrial Aging Processes with Machine Learning Methods [0.0]
We evaluate a wider range of data-driven models, comparing some traditional stateless models to more complex recurrent neural networks.
Our results show that recurrent models produce near perfect predictions when trained on larger datasets.
arXiv Detail & Related papers (2020-02-05T13:06:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.