Structured Pruning of Self-Supervised Pre-trained Models for Speech
Recognition and Understanding
- URL: http://arxiv.org/abs/2302.14132v1
- Date: Mon, 27 Feb 2023 20:39:54 GMT
- Title: Structured Pruning of Self-Supervised Pre-trained Models for Speech
Recognition and Understanding
- Authors: Yifan Peng, Kwangyoun Kim, Felix Wu, Prashant Sridhar, Shinji Watanabe
- Abstract summary: Self-supervised speech representation learning (SSL) has shown to be effective in various downstream tasks, but SSL models are usually large and slow.
We propose three task-specific structured pruning methods to deal with such heterogeneous networks.
Experiments on LibriSpeech and SLURP show that the proposed method is more accurate than the original wav2vecbase with 10% to 30% less, and is able to reduce the computation by 40% to 50% without any degradation.
- Score: 43.68557263195205
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised speech representation learning (SSL) has shown to be
effective in various downstream tasks, but SSL models are usually large and
slow. Model compression techniques such as pruning aim to reduce the model size
and computation without degradation in accuracy. Prior studies focus on the
pruning of Transformers; however, speech models not only utilize a stack of
Transformer blocks, but also combine a frontend network based on multiple
convolutional layers for low-level feature representation learning. This
frontend has a small size but a heavy computational cost. In this work, we
propose three task-specific structured pruning methods to deal with such
heterogeneous networks. Experiments on LibriSpeech and SLURP show that the
proposed method is more accurate than the original wav2vec2-base with 10% to
30% less computation, and is able to reduce the computation by 40% to 50%
without any degradation.
Related papers
- Quantized Transformer Language Model Implementations on Edge Devices [1.2979415757860164]
Large-scale transformer-based models like the Bidirectional Representations from Transformers (BERT) are widely used for Natural Language Processing (NLP) applications.
These models are initially pre-trained with a large corpus with millions of parameters and then fine-tuned for a downstream NLP task.
One of the major limitations of these large-scale models is that they cannot be deployed on resource-constrained devices due to their large model size and increased inference latency.
arXiv Detail & Related papers (2023-10-06T01:59:19Z) - Recycle-and-Distill: Universal Compression Strategy for
Transformer-based Speech SSL Models with Attention Map Reusing and Masking
Distillation [32.97898981684483]
Transformer-based speech self-supervised learning (SSL) models, such as HuBERT, show surprising performance in various speech processing tasks.
Huge number of parameters in speech SSL models necessitate the compression to a more compact model for wider usage in academia or small companies.
arXiv Detail & Related papers (2023-05-19T14:07:43Z) - CHAPTER: Exploiting Convolutional Neural Network Adapters for
Self-supervised Speech Models [62.60723685118747]
Self-supervised learning (SSL) is a powerful technique for learning representations from unlabeled data.
We propose an efficient tuning method specifically designed for SSL speech model, by applying CNN adapters at the feature extractor.
We empirically found that adding CNN to the feature extractor can help the adaptation on emotion and speaker tasks.
arXiv Detail & Related papers (2022-12-01T08:50:12Z) - ClusTR: Exploring Efficient Self-attention via Clustering for Vision
Transformers [70.76313507550684]
We propose a content-based sparse attention method, as an alternative to dense self-attention.
Specifically, we cluster and then aggregate key and value tokens, as a content-based method of reducing the total token count.
The resulting clustered-token sequence retains the semantic diversity of the original signal, but can be processed at a lower computational cost.
arXiv Detail & Related papers (2022-08-28T04:18:27Z) - Ultra Fast Speech Separation Model with Teacher Student Learning [44.71171732510265]
An ultra fast Transformer model is proposed to achieve better performance and efficiency with teacher student learning (T-S learning)
Compared with the small Transformer model trained from scratch, the proposed T-S learning method reduces the word error rate (WER) by more than 5% for both multi-channel and single-channel speech separation.
arXiv Detail & Related papers (2022-04-27T09:02:45Z) - Primer: Searching for Efficient Transformers for Language Modeling [79.2677566332444]
Training and inference costs of large Transformer models have grown rapidly and become expensive.
Here we aim to reduce the costs of Transformers by searching for a more efficient variant.
We identify an architecture, named Primer, that has a smaller training cost than the original Transformer.
arXiv Detail & Related papers (2021-09-17T17:50:39Z) - TERA: Self-Supervised Learning of Transformer Encoder Representation for
Speech [63.03318307254081]
TERA stands for Transformer Representations from Alteration.
We use alteration along three axes to pre-train Transformers on a large amount of unlabeled speech.
TERA can be used for speech representations extraction or fine-tuning with downstream models.
arXiv Detail & Related papers (2020-07-12T16:19:00Z) - Train Large, Then Compress: Rethinking Model Size for Efficient Training
and Inference of Transformers [94.43313684188819]
We study the impact of model size in this setting, focusing on Transformer models for NLP tasks that are limited by compute.
We first show that even though smaller Transformer models execute faster per iteration, wider and deeper models converge in significantly fewer steps.
This leads to an apparent trade-off between the training efficiency of large Transformer models and the inference efficiency of small Transformer models.
arXiv Detail & Related papers (2020-02-26T21:17:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.