Related papers: HuBERT-EE: Early Exiting HuBERT for Efficient Speech Recognition

HuBERT-EE: Early Exiting HuBERT for Efficient Speech Recognition

URL: http://arxiv.org/abs/2204.06328v2
Date: Wed, 19 Jun 2024 16:39:15 GMT
Title: HuBERT-EE: Early Exiting HuBERT for Efficient Speech Recognition
Authors: Ji Won Yoon, Beom Jun Woo, Nam Soo Kim,
Abstract summary: We introduce an early exit scheme for ASR, namely HuBERT-EE, that allows the model to stop the inference dynamically. Experimental results on the LibriSpeech show that HuBERT-EE can accelerate the inference of the HuBERT while simultaneously balancing the trade-off between the performance and the latency.
Score: 11.243855639847514
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Pre-training with self-supervised models, such as Hidden-unit BERT (HuBERT) and wav2vec 2.0, has brought significant improvements in automatic speech recognition (ASR). However, these models usually require an expensive computational cost to achieve outstanding performance, slowing down the inference speed. To improve the model efficiency, we introduce an early exit scheme for ASR, namely HuBERT-EE, that allows the model to stop the inference dynamically. In HuBERT-EE, multiple early exit branches are added at the intermediate layers. When the intermediate prediction of the early exit branch is confident, the model stops the inference, and the corresponding result can be returned early. We investigate the proper early exiting criterion and fine-tuning strategy to effectively perform early exiting. Experimental results on the LibriSpeech show that HuBERT-EE can accelerate the inference of the HuBERT while simultaneously balancing the trade-off between the performance and the latency.

Related papers

IDEA Prune: An Integrated Enlarge-and-Prune Pipeline in Generative Language Model Pretraining [50.53912352342753]
We propose an integrated enlarge-and-prune pipeline, which combines enlarge model training, pruning, and recovery. We conduct experiments on compressing 2.8B models to 1.3B with up to 2T tokens in pretraining. It demonstrates the integrated approach not only provides insights into the token efficiency of enlarged model pretraining but also achieves superior performance of pruned models.
arXiv Detail & Related papers (2025-03-07T20:35:31Z)
MS-HuBERT: Mitigating Pre-training and Inference Mismatch in Masked Language Modelling methods for learning Speech Representations [43.479279052047985]
MS-HuBERT is an end-to-end self-supervised pre-training method for learning robust speech representations. It beats vanilla HuBERT on the ASR Librispeech benchmark on average by a 5% margin when evaluated on different finetuning splits.
arXiv Detail & Related papers (2024-06-09T06:30:28Z)
DAISY: Data Adaptive Self-Supervised Early Exit for Speech Representation Models [55.608981341747246]
We introduce Data Adaptive Self-Supervised Early Exit (DAISY), an approach that decides when to exit based on the self-supervised loss. Our analysis on the adaptivity of DAISY shows that the model exits early (using fewer layers) on clean data while exits late (using more layers) on noisy data.
arXiv Detail & Related papers (2024-06-08T12:58:13Z)
CEEBERT: Cross-Domain Inference in Early Exit BERT [5.402030962296633]
CeeBERT learns optimal thresholds from domain-specific confidence observed at intermediate layers on the fly. CeeBERT can speed up the BERT/ALBERT models by $2times$ - $3.5times$ with minimal drop in accuracy.
arXiv Detail & Related papers (2024-05-23T20:36:10Z)
oBERTa: Improving Sparse Transfer Learning via improved initialization, distillation, and pruning regimes [82.99830498937729]
oBERTa is an easy-to-use set of language models for Natural Language Processing. It allows NLP practitioners to obtain between 3.8 and 24.3 times faster models without expertise in model compression. We explore the use of oBERTa on seven representative NLP tasks.
arXiv Detail & Related papers (2023-03-30T01:37:19Z)
Elbert: Fast Albert with Confidence-Window Based Early Exit [8.956309416589232]
Large pre-trained language models like BERT are not well-suited for resource-constrained or real-time applications. We propose the ELBERT, which significantly improves the average inference speed compared to ALBERT due to the proposed confidence-window based early exit mechanism.
arXiv Detail & Related papers (2021-07-01T02:02:39Z)
BinaryBERT: Pushing the Limit of BERT Quantization [74.65543496761553]
We propose BinaryBERT, which pushes BERT quantization to the limit with weight binarization. We find that a binary BERT is hard to be trained directly than a ternary counterpart due to its complex and irregular loss landscapes. Empirical results show that BinaryBERT has negligible performance drop compared to the full-precision BERT-base.
arXiv Detail & Related papers (2020-12-31T16:34:54Z)
BERT Loses Patience: Fast and Robust Inference with Early Exit [91.26199404912019]
We propose Patience-based Early Exit as a plug-and-play technique to improve the efficiency and robustness of a pretrained language model. Our approach improves inference efficiency as it allows the model to make a prediction with fewer layers.
arXiv Detail & Related papers (2020-06-07T13:38:32Z)
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference [69.93692147242284]
Large-scale pre-trained language models such as BERT have brought significant improvements to NLP applications. We propose a simple but effective method, DeeBERT, to accelerate BERT inference. Experiments show that DeeBERT is able to save up to 40% inference time with minimal degradation in model quality.
arXiv Detail & Related papers (2020-04-27T17:58:05Z)
Deliberation Model Based Two-Pass End-to-End Speech Recognition [52.45841282906516]
A two-pass model has been proposed to rescore streamed hypotheses using the non-streaming Listen, Attend and Spell (LAS) model. The model attends to acoustics to rescore hypotheses, as opposed to a class of neural correction models that use only first-pass text hypotheses. A bidirectional encoder is used to extract context information from first-pass hypotheses.
arXiv Detail & Related papers (2020-03-17T22:01:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.