TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression
For On-device ASR Models
- URL: http://arxiv.org/abs/2309.01947v2
- Date: Mon, 27 Nov 2023 05:03:31 GMT
- Title: TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression
For On-device ASR Models
- Authors: Yuan Shangguan, Haichuan Yang, Danni Li, Chunyang Wu, Yassir
Fathullah, Dilin Wang, Ayushi Dalmia, Raghuraman Krishnamoorthi, Ozlem
Kalinli, Junteng Jia, Jay Mahadeokar, Xin Lei, Mike Seltzer, Vikas Chandra
- Abstract summary: TODM is a new approach to efficiently train many sizes of hardware-friendly on-device ASR models with comparable GPU-hours to that of a single training job.
We introduce a novel combination of three techniques to improve the outcomes of the TODM Supernet.
Results demonstrate that our TODM Supernet either matches or surpasses the performance of manually tuned models by up to a relative of 3% better in word error rate (WER)
- Score: 30.758876520227666
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic Speech Recognition (ASR) models need to be optimized for specific
hardware before they can be deployed on devices. This can be done by tuning the
model's hyperparameters or exploring variations in its architecture.
Re-training and re-validating models after making these changes can be a
resource-intensive task. This paper presents TODM (Train Once Deploy Many), a
new approach to efficiently train many sizes of hardware-friendly on-device ASR
models with comparable GPU-hours to that of a single training job. TODM
leverages insights from prior work on Supernet, where Recurrent Neural Network
Transducer (RNN-T) models share weights within a Supernet. It reduces layer
sizes and widths of the Supernet to obtain subnetworks, making them smaller
models suitable for all hardware types. We introduce a novel combination of
three techniques to improve the outcomes of the TODM Supernet: adaptive
dropouts, an in-place Alpha-divergence knowledge distillation, and the use of
ScaledAdam optimizer. We validate our approach by comparing Supernet-trained
versus individually tuned Multi-Head State Space Model (MH-SSM) RNN-T using
LibriSpeech. Results demonstrate that our TODM Supernet either matches or
surpasses the performance of manually tuned models by up to a relative of 3%
better in word error rate (WER), while efficiently keeping the cost of training
many models at a small constant.
Related papers
- Dynamic Encoder Size Based on Data-Driven Layer-wise Pruning for Speech Recognition [24.71497121634708]
Varying-size models are often required to deploy ASR systems under different hardware and/or application constraints.
We present the dynamic encoder size approach, which jointly trains multiple performant models within one supernet from scratch.
arXiv Detail & Related papers (2024-07-10T08:35:21Z) - Dynamic Pre-training: Towards Efficient and Scalable All-in-One Image Restoration [100.54419875604721]
All-in-one image restoration tackles different types of degradations with a unified model instead of having task-specific, non-generic models for each degradation.
We propose DyNet, a dynamic family of networks designed in an encoder-decoder style for all-in-one image restoration tasks.
Our DyNet can seamlessly switch between its bulkier and lightweight variants, thereby offering flexibility for efficient model deployment.
arXiv Detail & Related papers (2024-04-02T17:58:49Z) - A-SDM: Accelerating Stable Diffusion through Redundancy Removal and
Performance Optimization [54.113083217869516]
In this work, we first explore the computational redundancy part of the network.
We then prune the redundancy blocks of the model and maintain the network performance.
Thirdly, we propose a global-regional interactive (GRI) attention to speed up the computationally intensive attention part.
arXiv Detail & Related papers (2023-12-24T15:37:47Z) - Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts [55.470959564665705]
Weight-sharing supernets are crucial for performance estimation in cutting-edge neural search frameworks.
The proposed method attains state-of-the-art (SoTA) performance in NAS for fast machine translation models.
It excels in NAS for building memory-efficient task-agnostic BERT models.
arXiv Detail & Related papers (2023-06-08T00:35:36Z) - Transfer-Once-For-All: AI Model Optimization for Edge [0.0]
We propose Transfer-Once-For-All (TOFA) for supernet-style training on small data sets with constant computational training cost.
To overcome the challenges arising from small data, TOFA utilizes a unified semi-supervised training loss to simultaneously train all existings within the supernet.
arXiv Detail & Related papers (2023-03-27T04:14:30Z) - Learning a Dual-Mode Speech Recognition Model via Self-Pruning [18.248552732790852]
This work aims to jointly learn a compact sparse on-device streaming ASR model, and a large dense server non-streaming model, in a single supernet.
We present that, performing supernet training on both wav2vec 2.0 self-supervised learning and supervised ASR fine-tuning can not only substantially improve the large non-streaming model as shown in prior works, and also be able to improve the compact sparse streaming model.
arXiv Detail & Related papers (2022-07-25T05:03:13Z) - Omni-sparsity DNN: Fast Sparsity Optimization for On-Device Streaming
E2E ASR via Supernet [24.62661549442265]
We propose Omni-sparsity DNN, where a single neural network can be pruned to generate optimized model for a large range of model sizes.
Our results show great saving on training time and resources with similar or better accuracy on LibriSpeech compared to individually pruned models.
arXiv Detail & Related papers (2021-10-15T20:28:27Z) - A Fully Tensorized Recurrent Neural Network [48.50376453324581]
We introduce a "fully tensorized" RNN architecture which jointly encodes the separate weight matrices within each recurrent cell.
This approach reduces model size by several orders of magnitude, while still maintaining similar or better performance compared to standard RNNs.
arXiv Detail & Related papers (2020-10-08T18:24:12Z) - Accurate and Lightweight Image Super-Resolution with Model-Guided Deep
Unfolding Network [63.69237156340457]
We present and advocate an explainable approach toward SISR named model-guided deep unfolding network (MoG-DUN)
MoG-DUN is accurate (producing fewer aliasing artifacts), computationally efficient (with reduced model parameters), and versatile (capable of handling multiple degradations)
The superiority of the proposed MoG-DUN method to existing state-of-theart image methods including RCAN, SRDNF, and SRFBN is substantiated by extensive experiments on several popular datasets and various degradation scenarios.
arXiv Detail & Related papers (2020-09-14T08:23:37Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.