Match to Win: Analysing Sequences Lengths for Efficient Self-supervised
Learning in Speech and Audio
- URL: http://arxiv.org/abs/2209.15575v2
- Date: Mon, 3 Oct 2022 20:15:26 GMT
- Title: Match to Win: Analysing Sequences Lengths for Efficient Self-supervised
Learning in Speech and Audio
- Authors: Yan Gao, Javier Fernandez-Marques, Titouan Parcollet, Pedro P. B. de
Gusmao, Nicholas D. Lane
- Abstract summary: Self-supervised learning has proven vital in speech and audio-related applications.
This paper provides the first empirical study of SSL pre-training for different specified sequence lengths.
We find that training on short sequences can dramatically reduce resource costs while retaining a satisfactory performance for all tasks.
- Score: 19.865050806327147
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised learning (SSL) has proven vital in speech and audio-related
applications. The paradigm trains a general model on unlabeled data that can
later be used to solve specific downstream tasks. This type of model is costly
to train as it requires manipulating long input sequences that can only be
handled by powerful centralised servers. Surprisingly, despite many attempts to
increase training efficiency through model compression, the effects of
truncating input sequence lengths to reduce computation have not been studied.
In this paper, we provide the first empirical study of SSL pre-training for
different specified sequence lengths and link this to various downstream tasks.
We find that training on short sequences can dramatically reduce resource costs
while retaining a satisfactory performance for all tasks. This simple one-line
change would promote the migration of SSL training from data centres to
user-end edge devices for more realistic and personalised applications.
Related papers
- DailyMAE: Towards Pretraining Masked Autoencoders in One Day [37.206816999538496]
Masked image modeling (MIM) has drawn attention for its effectiveness in learning data representation from unlabeled data.
In this study, we propose efficient training recipes for MIM based SSL that focuses on mitigating data loading bottlenecks.
Our library enables the training of a MAE-Base/16 model on the ImageNet 1K dataset for 800 epochs within just 18 hours.
arXiv Detail & Related papers (2024-03-31T00:59:10Z) - How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression? [92.90857135952231]
Transformers pretrained on diverse tasks exhibit remarkable in-context learning (ICL) capabilities.
We study ICL in one of its simplest setups: pretraining a linearly parameterized single-layer linear attention model for linear regression.
arXiv Detail & Related papers (2023-10-12T15:01:43Z) - Pre-training with Synthetic Data Helps Offline Reinforcement Learning [4.531082205797088]
We show that language is not essential for improved performance.
We then consider pre-training Conservative Q-Learning (CQL), a popular offline DRL algorithm.
Surprisingly, pre-training with simple synthetic data for a small number of updates can also improve CQL.
arXiv Detail & Related papers (2023-10-01T19:32:14Z) - Fast Machine Unlearning Without Retraining Through Selective Synaptic
Dampening [51.34904967046097]
Selective Synaptic Dampening (SSD) is a fast, performant, and does not require long-term storage of the training data.
We present a novel two-step, post hoc, retrain-free approach to machine unlearning which is fast, performant, and does not require long-term storage of the training data.
arXiv Detail & Related papers (2023-08-15T11:30:45Z) - Revisit Few-shot Intent Classification with PLMs: Direct Fine-tuning vs. Continual Pre-training [20.98770732015944]
Few-shot intent detection involves training a deep learning model to classify utterances based on their underlying intents using only a small amount of labeled data.
We show that continual pre-training may not be essential, since the overfitting problem of PLMs on this task may not be as serious as expected.
To maximize the utilization of the limited available data, we propose a context augmentation method and leverage sequential self-distillation to boost performance.
arXiv Detail & Related papers (2023-06-08T15:26:52Z) - SLICER: Learning universal audio representations using low-resource
self-supervised pre-training [53.06337011259031]
We present a new Self-Supervised Learning approach to pre-train encoders on unlabeled audio data.
Our primary aim is to learn audio representations that can generalize across a large variety of speech and non-speech tasks.
arXiv Detail & Related papers (2022-11-02T23:45:33Z) - Exploring Efficient-tuning Methods in Self-supervised Speech Models [53.633222197712875]
Self-supervised learning can learn powerful representations for different speech tasks.
In downstream tasks, the parameters of SSL models are frozen, and only the adapters are trained.
We show that the performance parity can be achieved with over 90% parameter reduction.
arXiv Detail & Related papers (2022-10-10T11:08:12Z) - Task-Customized Self-Supervised Pre-training with Scalable Dynamic
Routing [76.78772372631623]
A common practice for self-supervised pre-training is to use as much data as possible.
For a specific downstream task, however, involving irrelevant data in pre-training may degenerate the downstream performance.
It is burdensome and infeasible to use different downstream-task-customized datasets in pre-training for different tasks.
arXiv Detail & Related papers (2022-05-26T10:49:43Z) - DATA: Domain-Aware and Task-Aware Pre-training [94.62676913928831]
We present DATA, a simple yet effective NAS approach specialized for self-supervised learning (SSL)
Our method achieves promising results across a wide range of computation costs on downstream tasks, including image classification, object detection and semantic segmentation.
arXiv Detail & Related papers (2022-03-17T02:38:49Z) - Transfer Learning or Self-supervised Learning? A Tale of Two Pretraining
Paradigms [36.04356511882304]
Self-supervised learning (SSL) has demonstrated promising results on a wide range of applications.
There has not been a clear understanding on what properties of data and tasks render one approach outperforms the other.
arXiv Detail & Related papers (2020-06-19T05:21:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.