A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes
- URL: http://arxiv.org/abs/2204.06164v1
- Date: Wed, 13 Apr 2022 04:15:51 GMT
- Title: A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes
- Authors: Shaojin Ding, Weiran Wang, Ding Zhao, Tara N. Sainath, Yanzhang He,
Robert David, Rami Botros, Xin Wang, Rina Panigrahy, Qiao Liang, Dongseong
Hwang, Ian McGraw, Rohit Prabhavalkar, Trevor Strohman
- Abstract summary: We propose a dynamic cascaded encoder Automatic Speech Recognition (ASR) model, which unifies models for different deployment scenarios.
The proposed large-medium model has 30% smaller size and reduces power consumption by 33%, compared to the baseline cascaded encoder model.
The triple-size model that unifies the large, medium, and small models achieves 37% total size reduction with minimal quality loss.
- Score: 54.83802872236367
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a dynamic cascaded encoder Automatic Speech
Recognition (ASR) model, which unifies models for different deployment
scenarios. Moreover, the model can significantly reduce model size and power
consumption without loss of quality. Namely, with the dynamic cascaded encoder
model, we explore three techniques to maximally boost the performance of each
model size: 1) Use separate decoders for each sub-model while sharing the
encoders; 2) Use funnel-pooling to improve the encoder efficiency; 3) Balance
the size of causal and non-causal encoders to improve quality and fit
deployment constraints. Overall, the proposed large-medium model has 30%
smaller size and reduces power consumption by 33%, compared to the baseline
cascaded encoder model. The triple-size model that unifies the large, medium,
and small models achieves 37% total size reduction with minimal quality loss,
while substantially reducing the engineering efforts of having separate models.
Related papers
- Diffusion Product Quantization [18.32568431229839]
We explore the quantization of diffusion models in extreme compression regimes to reduce model size while maintaining performance.
We apply our compression method to the DiT model on ImageNet and consistently outperform other quantization approaches.
arXiv Detail & Related papers (2024-11-19T07:47:37Z) - 4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders [53.297697898510194]
We propose a joint modeling scheme where four decoders share the same encoder -- we refer to this as 4D modeling.
To efficiently train the 4D model, we introduce a two-stage training strategy that stabilizes multitask learning.
In addition, we propose three novel one-pass beam search algorithms by combining three decoders.
arXiv Detail & Related papers (2024-06-05T05:18:20Z) - EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - Dynamic Pre-training: Towards Efficient and Scalable All-in-One Image Restoration [100.54419875604721]
All-in-one image restoration tackles different types of degradations with a unified model instead of having task-specific, non-generic models for each degradation.
We propose DyNet, a dynamic family of networks designed in an encoder-decoder style for all-in-one image restoration tasks.
Our DyNet can seamlessly switch between its bulkier and lightweight variants, thereby offering flexibility for efficient model deployment.
arXiv Detail & Related papers (2024-04-02T17:58:49Z) - Enhancing Quantised End-to-End ASR Models via Personalisation [12.971231464928806]
We propose a novel strategy of personalisation for a quantised model (PQM)
PQM uses a 4-bit NormalFloat Quantisation (NF4) approach for model quantisation and low-rank adaptation (LoRA) for SAT.
Experiments have been performed on the LibriSpeech and the TED-LIUM 3 corpora.
arXiv Detail & Related papers (2023-09-17T02:35:21Z) - Complexity Matters: Rethinking the Latent Space for Generative Modeling [65.64763873078114]
In generative modeling, numerous successful approaches leverage a low-dimensional latent space, e.g., Stable Diffusion.
In this study, we aim to shed light on this under-explored topic by rethinking the latent space from the perspective of model complexity.
arXiv Detail & Related papers (2023-07-17T07:12:29Z) - 4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict
decoders [29.799797974513552]
This paper proposes four-decoder joint modeling (4D) of CTC, attention, RNN-T, and mask-predict.
The four decoders are jointly trained so that they can be easily switched depending on the application scenarios.
The experimental results showed that the proposed model consistently reduced the WER.
arXiv Detail & Related papers (2022-12-21T07:15:59Z) - Multi-stage Progressive Compression of Conformer Transducer for
On-device Speech Recognition [7.450574974954803]
Small memory bandwidth in smart devices prompts development of smaller Automatic Speech Recognition (ASR) models.
Knowledge distillation (KD) is a popular model compression approach that has shown to achieve smaller model size.
We propose a multi-stage progressive approach to compress the conformer transducer model using KD.
arXiv Detail & Related papers (2022-10-01T02:23:00Z) - Rate Distortion Characteristic Modeling for Neural Image Compression [59.25700168404325]
End-to-end optimization capability offers neural image compression (NIC) superior lossy compression performance.
distinct models are required to be trained to reach different points in the rate-distortion (R-D) space.
We make efforts to formulate the essential mathematical functions to describe the R-D behavior of NIC using deep network and statistical modeling.
arXiv Detail & Related papers (2021-06-24T12:23:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.