Big model only for hard audios: Sample dependent Whisper model selection
for efficient inferences
- URL: http://arxiv.org/abs/2309.12712v1
- Date: Fri, 22 Sep 2023 08:50:58 GMT
- Title: Big model only for hard audios: Sample dependent Whisper model selection
for efficient inferences
- Authors: Hugo Malard, Salah Zaiem, Robin Algayres
- Abstract summary: Several ASR models exist in various sizes, with different inference costs leading to different performance levels.
We propose to train a decision module, that would allow, given an audio sample, to use the smallest sufficient model leading to a good transcription.
By keeping the decision process computationally efficient, we build a decision module that allows substantial computational savings with reduced performance drops.
- Score: 7.592727209806414
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent progress in Automatic Speech Recognition (ASR) has been coupled with a
substantial increase in the model sizes, which may now contain billions of
parameters, leading to slow inferences even with adapted hardware. In this
context, several ASR models exist in various sizes, with different inference
costs leading to different performance levels. Based on the observation that
smaller models perform optimally on large parts of testing corpora, we propose
to train a decision module, that would allow, given an audio sample, to use the
smallest sufficient model leading to a good transcription. We apply our
approach to two Whisper models with different sizes. By keeping the decision
process computationally efficient, we build a decision module that allows
substantial computational savings with reduced performance drops.
Related papers
- Introducing Flexible Monotone Multiple Choice Item Response Theory Models and Bit Scales [0.0]
We present a new model for multiple choice data, the monotone multiple choice (MMC) model, which we fit using autoencoders.
We demonstrate empirically that the MMC model outperforms the traditional nominal response IRT model in terms of fit.
arXiv Detail & Related papers (2024-10-02T12:33:16Z) - EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - AutoMix: Automatically Mixing Language Models [62.51238143437967]
Large language models (LLMs) are now available from cloud API providers in various sizes and configurations.
We present Automix, an approach that strategically routes queries to larger LMs, based on the approximate correctness of outputs from a smaller LM.
arXiv Detail & Related papers (2023-10-19T17:57:39Z) - eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception.
Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency.
We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z) - Sharing Low Rank Conformer Weights for Tiny Always-On Ambient Speech
Recognition Models [47.99478573698432]
We consider methods to reduce the model size of Conformer-based speech recognition models.
Such a model allows us to achieve always-on ambient speech recognition on edge devices with low-memory neural processors.
arXiv Detail & Related papers (2023-03-15T03:21:38Z) - Who Says Elephants Can't Run: Bringing Large Scale MoE Models into Cloud
Scale Production [7.056223012587321]
We introduce a highly efficient inference framework with several optimization approaches to accelerate the computation of sparse models.
We are able to deploy 136x larger models with 27% less cost and significantly better quality compared to the existing solutions.
arXiv Detail & Related papers (2022-11-18T03:43:52Z) - Model ensemble instead of prompt fusion: a sample-specific knowledge
transfer method for few-shot prompt tuning [85.55727213502402]
We focus on improving the few-shot performance of prompt tuning by transferring knowledge from soft prompts of source tasks.
We propose Sample-specific Ensemble of Source Models (SESoM)
SESoM learns to adjust the contribution of each source model for each target sample separately when ensembling source model outputs.
arXiv Detail & Related papers (2022-10-23T01:33:16Z) - MoEfication: Conditional Computation of Transformer Models for Efficient
Inference [66.56994436947441]
Transformer-based pre-trained language models can achieve superior performance on most NLP tasks due to large parameter capacity, but also lead to huge computation cost.
We explore to accelerate large-model inference by conditional computation based on the sparse activation phenomenon.
We propose to transform a large model into its mixture-of-experts (MoE) version with equal model size, namely MoEfication.
arXiv Detail & Related papers (2021-10-05T02:14:38Z) - Efficient End-to-End Speech Recognition Using Performers in Conformers [74.71219757585841]
We propose to reduce the complexity of model architectures in addition to model sizes.
The proposed model yields competitive performance on the LibriSpeech corpus with 10 millions of parameters and linear complexity.
arXiv Detail & Related papers (2020-11-09T05:22:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.