Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with
Academic Compute
- URL: http://arxiv.org/abs/2306.06672v1
- Date: Sun, 11 Jun 2023 12:53:46 GMT
- Title: Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with
Academic Compute
- Authors: William Chen, Xuankai Chang, Yifan Peng, Zhaoheng Ni, Soumi Maiti,
Shinji Watanabe
- Abstract summary: Self-supervised learning (SSL) has led to great strides in speech processing.
However, the resources needed to train these models has become prohibitively large.
In this work, we optimize HuBERT SSL to fit in academic constraints.
- Score: 40.6786244647298
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised learning (SSL) has led to great strides in speech processing.
However, the resources needed to train these models has become prohibitively
large as they continue to scale. Currently, only a few groups with substantial
resources are capable of creating SSL models, which harms reproducibility. In
this work, we optimize HuBERT SSL to fit in academic constraints. We reproduce
HuBERT independently from the original implementation, with no performance
loss. Our code and training optimizations make SSL feasible with only 8 GPUs,
instead of the 32 used in the original work. We also explore a semi-supervised
route, using an ASR model to skip the first pre-training iteration. Within one
iteration of pre-training, our models improve over HuBERT on several tasks.
Furthermore, our HuBERT Large variant requires only 8 GPUs, achieving similar
performance to the original trained on 128. As our contribution to the
community, all models, configurations, and code are made open-source in ESPnet.
Related papers
- Joint Prediction and Denoising for Large-scale Multilingual
Self-supervised Learning [69.77973092264338]
We show that more powerful techniques can lead to more efficient pre-training, opening SSL to more research groups.
We propose WavLabLM, which extends WavLM's joint prediction and denoising to 40k hours of data across 136 languages.
We show that further efficiency can be achieved with a vanilla HuBERT Base model, which can maintain 94% of XLS-R's performance with only 3% of the data.
arXiv Detail & Related papers (2023-09-26T23:55:57Z) - Pushing the Limits of Unsupervised Unit Discovery for SSL Speech
Representation [12.506633315768832]
HuBERT is a successful example that utilizes offline clustering to convert speech features into discrete units for a masked language modeling pretext task.
We present an unsupervised method to improve SSL targets.
Two models are proposed, MonoBERT and PolyBERT, which leverage context-independent and context-dependent phoneme-based units for pre-training.
arXiv Detail & Related papers (2023-06-15T07:45:12Z) - DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech
Models [34.464301065191336]
Self-supervised learning (SSL) has achieved notable success in many speech processing tasks, but the large model size and heavy computational cost hinder the deployment.
We propose DPHuBERT, a novel task-agnostic compression method for speech SSL based on joint distillation and pruning.
arXiv Detail & Related papers (2023-05-28T07:09:33Z) - MelHuBERT: A simplified HuBERT on Mel spectrograms [55.608981341747246]
We revisit the training of HuBERT, a highly successful self-supervised model.
We improve and simplify several key components, including the loss function, input representation, and training in multiple stages.
Our model, MelHuBERT, is able to achieve favorable performance on phone recognition, speaker identification, and automatic speech recognition.
arXiv Detail & Related papers (2022-11-17T23:38:29Z) - Match to Win: Analysing Sequences Lengths for Efficient Self-supervised
Learning in Speech and Audio [19.865050806327147]
Self-supervised learning has proven vital in speech and audio-related applications.
This paper provides the first empirical study of SSL pre-training for different specified sequence lengths.
We find that training on short sequences can dramatically reduce resource costs while retaining a satisfactory performance for all tasks.
arXiv Detail & Related papers (2022-09-30T16:35:42Z) - DSPNet: Towards Slimmable Pretrained Networks based on Discriminative
Self-supervised Learning [43.45674911425684]
We propose Discriminative-SSL-based Slimmable Pretrained Networks (DSPNet)
DSPNet can be trained at once and then slimmed to multiple sub-networks of various sizes.
We show comparable or improved performance of DSPNet on ImageNet to the networks individually pretrained.
arXiv Detail & Related papers (2022-07-13T09:32:54Z) - bert2BERT: Towards Reusable Pretrained Language Models [51.078081486422896]
We propose bert2BERT, which can effectively transfer the knowledge of an existing smaller pre-trained model to a large model.
bert2BERT saves about 45% and 47% computational cost of pre-training BERT_BASE and GPT_BASE by reusing the models of almost their half sizes.
arXiv Detail & Related papers (2021-10-14T04:05:25Z) - Meta-Learned Attribute Self-Gating for Continual Generalized Zero-Shot
Learning [82.07273754143547]
We propose a meta-continual zero-shot learning (MCZSL) approach to generalizing a model to categories unseen during training.
By pairing self-gating of attributes and scaled class normalization with meta-learning based training, we are able to outperform state-of-the-art results.
arXiv Detail & Related papers (2021-02-23T18:36:14Z) - EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets [106.79387235014379]
EarlyBERT is a general computationally-efficient training algorithm applicable to both pre-training and fine-tuning of large-scale language models.
We are the first to identify structured winning tickets in the early stage of BERT training, and use them for efficient training.
EarlyBERT easily achieves comparable performance to standard BERT with 3545% less training time.
arXiv Detail & Related papers (2020-12-31T20:38:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.