Fine-tuning Strategies for Faster Inference using Speech Self-Supervised
Models: A Comparative Study
- URL: http://arxiv.org/abs/2303.06740v1
- Date: Sun, 12 Mar 2023 19:52:34 GMT
- Title: Fine-tuning Strategies for Faster Inference using Speech Self-Supervised
Models: A Comparative Study
- Authors: Salah Zaiem, Robin Algayres, Titouan Parcollet, Slim Essid and Mirco
Ravanelli
- Abstract summary: Self-supervised learning (SSL) has allowed substantial progress in Automatic Speech Recognition (ASR) performance in low-resource settings.
This article explores different approaches that may be deployed during the fine-tuning to reduce the computations needed in the SSL encoder.
- Score: 25.58608455210458
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised learning (SSL) has allowed substantial progress in Automatic
Speech Recognition (ASR) performance in low-resource settings. In this context,
it has been demonstrated that larger self-supervised feature extractors are
crucial for achieving lower downstream ASR error rates. Thus, better
performance might be sanctioned with longer inferences. This article explores
different approaches that may be deployed during the fine-tuning to reduce the
computations needed in the SSL encoder, leading to faster inferences. We adapt
a number of existing techniques to common ASR settings and benchmark them,
displaying performance drops and gains in inference times. Interestingly, we
found that given enough downstream data, a simple downsampling of the input
sequences outperforms the other methods with both low performance drops and
high computational savings, reducing computations by 61.3% with an WER increase
of only 0.81. Finally, we analyze the robustness of the comparison to changes
in dataset conditions, revealing sensitivity to dataset size.
Related papers
- SIRST-5K: Exploring Massive Negatives Synthesis with Self-supervised
Learning for Robust Infrared Small Target Detection [53.19618419772467]
Single-frame infrared small target (SIRST) detection aims to recognize small targets from clutter backgrounds.
With the development of Transformer, the scale of SIRST models is constantly increasing.
With a rich diversity of infrared small target data, our algorithm significantly improves the model performance and convergence speed.
arXiv Detail & Related papers (2024-03-08T16:14:54Z) - Low-Rank Representations Meets Deep Unfolding: A Generalized and
Interpretable Network for Hyperspectral Anomaly Detection [41.50904949744355]
Current hyperspectral anomaly detection (HAD) benchmark datasets suffer from low resolution, simple background, and small size of the detection data.
These factors also limit the performance of the well-known low-rank representation (LRR) models in terms of robustness.
We build a new set of HAD benchmark datasets for improving the robustness of the HAD algorithm in complex scenarios, AIR-HAD for short.
arXiv Detail & Related papers (2024-02-23T14:15:58Z) - Consensus-Adaptive RANSAC [104.87576373187426]
We propose a new RANSAC framework that learns to explore the parameter space by considering the residuals seen so far via a novel attention layer.
The attention mechanism operates on a batch of point-to-model residuals, and updates a per-point estimation state to take into account the consensus found through a lightweight one-step transformer.
arXiv Detail & Related papers (2023-07-26T08:25:46Z) - RoPDA: Robust Prompt-based Data Augmentation for Low-Resource Named
Entity Recognition [10.03246698225533]
Robust Prompt-based Data Augmentation (RoPDA) for low-resource NER
Based on pre-trained language models (PLMs) with continuous prompt, RoPDA performs entity augmentation and context augmentation.
Experiments on three benchmarks from different domains demonstrate that RoPDA significantly improves upon strong baselines.
arXiv Detail & Related papers (2023-07-11T14:44:14Z) - An Efficiency Study for SPLADE Models [5.725475501578801]
In this paper, we focus on improving the efficiency of the SPLADE model.
We propose several techniques including L1 regularization for queries, a separation of document/ encoders, a FLOPS-regularized middle-training, and the use of faster query encoders.
arXiv Detail & Related papers (2022-07-08T11:42:05Z) - Efficient Sharpness-aware Minimization for Improved Training of Neural
Networks [146.2011175973769]
This paper proposes Efficient Sharpness Aware Minimizer (M) which boosts SAM s efficiency at no cost to its generalization performance.
M includes two novel and efficient training strategies-StochasticWeight Perturbation and Sharpness-Sensitive Data Selection.
We show, via extensive experiments on the CIFAR and ImageNet datasets, that ESAM enhances the efficiency over SAM from requiring 100% extra computations to 40% vis-a-vis bases.
arXiv Detail & Related papers (2021-10-07T02:20:37Z) - Fast Distributionally Robust Learning with Variance Reduced Min-Max
Optimization [85.84019017587477]
Distributionally robust supervised learning is emerging as a key paradigm for building reliable machine learning systems for real-world applications.
Existing algorithms for solving Wasserstein DRSL involve solving complex subproblems or fail to make use of gradients.
We revisit Wasserstein DRSL through the lens of min-max optimization and derive scalable and efficiently implementable extra-gradient algorithms.
arXiv Detail & Related papers (2021-04-27T16:56:09Z) - Transformer-based ASR Incorporating Time-reduction Layer and Fine-tuning
with Self-Knowledge Distillation [11.52842516726486]
We propose a Transformer-based ASR model with the time reduction layer, in which we incorporate time reduction layer inside transformer encoder layers.
We also introduce a fine-tuning approach for pre-trained ASR models using self-knowledge distillation (S-KD) which further improves the performance of our ASR model.
With language model (LM) fusion, we achieve new state-of-the-art word error rate (WER) results for Transformer-based ASR models.
arXiv Detail & Related papers (2021-03-17T21:02:36Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Improving noise robust automatic speech recognition with single-channel
time-domain enhancement network [100.1041336974175]
We show that a single-channel time-domain denoising approach can significantly improve ASR performance.
We show that single-channel noise reduction can still improve ASR performance.
arXiv Detail & Related papers (2020-03-09T09:36:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.