Efficient Training of Self-Supervised Speech Foundation Models on a
Compute Budget
- URL: http://arxiv.org/abs/2409.16295v1
- Date: Mon, 9 Sep 2024 10:36:42 GMT
- Title: Efficient Training of Self-Supervised Speech Foundation Models on a
Compute Budget
- Authors: Andy T. Liu, Yi-Cheng Lin, Haibin Wu, Stefan Winkler, Hung-yi Lee
- Abstract summary: This paper investigates how to efficiently train speech foundation models with self-supervised learning (SSL) under a limited compute budget.
We examine critical factors in SSL that impact the budget, including model architecture, model size, and data size.
- Score: 57.807614181024114
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite their impressive success, training foundation models remains
computationally costly. This paper investigates how to efficiently train speech
foundation models with self-supervised learning (SSL) under a limited compute
budget. We examine critical factors in SSL that impact the budget, including
model architecture, model size, and data size. Our goal is to make analytical
steps toward understanding the training dynamics of speech foundation models.
We benchmark SSL objectives in an entirely comparable setting and find that
other factors contribute more significantly to the success of SSL. Our results
show that slimmer model architectures outperform common small architectures
under the same compute and parameter budget. We demonstrate that the size of
the pre-training data remains crucial, even with data augmentation during SSL
training, as performance suffers when iterating over limited data. Finally, we
identify a trade-off between model size and data size, highlighting an optimal
model size for a given compute budget.
Related papers
- Order of Magnitude Speedups for LLM Membership Inference [5.124111136127848]
Large Language Models (LLMs) have the promise to revolutionize computing broadly, but their complexity and extensive training data also expose privacy vulnerabilities.
One of the simplest privacy risks associated with LLMs is their susceptibility to membership inference attacks (MIAs)
We propose a low-cost MIA that leverages an ensemble of small quantile regression models to determine if a document belongs to the model's training set or not.
arXiv Detail & Related papers (2024-09-22T16:18:14Z) - Data Shapley in One Training Run [88.59484417202454]
Data Shapley provides a principled framework for attributing data's contribution within machine learning contexts.
Existing approaches require re-training models on different data subsets, which is computationally intensive.
This paper introduces In-Run Data Shapley, which addresses these limitations by offering scalable data attribution for a target model of interest.
arXiv Detail & Related papers (2024-06-16T17:09:24Z) - More Compute Is What You Need [3.184416958830696]
We propose a new scaling law that suggests model performance depends mostly on the amount of compute spent for transformer-based models.
We predict that (a) for inference efficiency, training should prioritize smaller model sizes and larger training datasets, and (b) assuming the exhaustion of available web datasets, scaling the model size might be the only way to further improve model performance.
arXiv Detail & Related papers (2024-04-30T12:05:48Z) - An Analysis of Initial Training Strategies for Exemplar-Free
Class-Incremental Learning [36.619804184427245]
Class-Incremental Learning (CIL) aims to build classification models from data streams.
Due to catastrophic forgetting, CIL is particularly challenging when examples from past classes cannot be stored.
Use of models pre-trained in a self-supervised way on large amounts of data has recently gained momentum.
arXiv Detail & Related papers (2023-08-22T14:06:40Z) - Pushing the Limits of Unsupervised Unit Discovery for SSL Speech
Representation [12.506633315768832]
HuBERT is a successful example that utilizes offline clustering to convert speech features into discrete units for a masked language modeling pretext task.
We present an unsupervised method to improve SSL targets.
Two models are proposed, MonoBERT and PolyBERT, which leverage context-independent and context-dependent phoneme-based units for pre-training.
arXiv Detail & Related papers (2023-06-15T07:45:12Z) - MiniSUPERB: Lightweight Benchmark for Self-supervised Speech Models [90.99663022952498]
SuperB was proposed to evaluate the generalizability of self-supervised learning (SSL) speech models across various tasks.
SuperB incurs high computational costs due to the large datasets and diverse tasks.
We introduce MiniSUPERB, a lightweight benchmark that efficiently evaluates SSL speech models with comparable results to SUPERB but lower computational costs significantly.
arXiv Detail & Related papers (2023-05-30T13:07:33Z) - Towards Sustainable Self-supervised Learning [193.78876000005366]
We propose a Target-Enhanced Conditional (TEC) scheme which introduces two components to the existing mask-reconstruction based SSL.
First, we propose patch-relation enhanced targets which enhances the target given by base model and encourages the new model to learn semantic-relation knowledge from the base model.
Secondly, we introduce a conditional adapter that adaptively adjusts new model prediction to align with the target of different base models.
arXiv Detail & Related papers (2022-10-20T04:49:56Z) - Measuring Causal Effects of Data Statistics on Language Model's
`Factual' Predictions [59.284907093349425]
Large amounts of training data are one of the major reasons for the high performance of state-of-the-art NLP models.
We provide a language for describing how training data influences predictions, through a causal framework.
Our framework bypasses the need to retrain expensive models and allows us to estimate causal effects based on observational data alone.
arXiv Detail & Related papers (2022-07-28T17:36:24Z) - Feeding What You Need by Understanding What You Learned [54.400455868448695]
Machine Reading (MRC) reveals the ability to understand a given text passage and answer questions based on it.
Existing research works in MRC rely heavily on large-size models and corpus to improve the performance evaluated by metrics such as Exact Match.
We argue that a deep understanding of model capabilities and data properties can help us feed a model with appropriate training data.
arXiv Detail & Related papers (2022-03-05T14:15:59Z) - Scaling Laws for Neural Language Models [14.472857826717613]
We study scaling laws for language model performance on the cross-entropy loss.
The loss scales as a power-law with model size, dataset size, and the amount of compute used for training.
arXiv Detail & Related papers (2020-01-23T03:59:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.