Related papers: Omni-sparsity DNN: Fast Sparsity Optimization for On-Device Streaming E2E ASR via Supernet

Omni-sparsity DNN: Fast Sparsity Optimization for On-Device Streaming E2E ASR via Supernet

URL: http://arxiv.org/abs/2110.08352v1
Date: Fri, 15 Oct 2021 20:28:27 GMT
Title: Omni-sparsity DNN: Fast Sparsity Optimization for On-Device Streaming E2E ASR via Supernet
Authors: Haichuan Yang, Yuan Shangguan, Dilin Wang, Meng Li, Pierce Chuang, Xiaohui Zhang, Ganesh Venkatesh, Ozlem Kalinli, Vikas Chandra
Abstract summary: We propose Omni-sparsity DNN, where a single neural network can be pruned to generate optimized model for a large range of model sizes. Our results show great saving on training time and resources with similar or better accuracy on LibriSpeech compared to individually pruned models.
Score: 24.62661549442265
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: From wearables to powerful smart devices, modern automatic speech recognition (ASR) models run on a variety of edge devices with different computational budgets. To navigate the Pareto front of model accuracy vs model size, researchers are trapped in a dilemma of optimizing model accuracy by training and fine-tuning models for each individual edge device while keeping the training GPU-hours tractable. In this paper, we propose Omni-sparsity DNN, where a single neural network can be pruned to generate optimized model for a large range of model sizes. We develop training strategies for Omni-sparsity DNN that allows it to find models along the Pareto front of word-error-rate (WER) vs model size while keeping the training GPU-hours to no more than that of training one singular model. We demonstrate the Omni-sparsity DNN with streaming E2E ASR models. Our results show great saving on training time and resources with similar or better accuracy on LibriSpeech compared to individually pruned sparse models: 2%-6.6% better WER on Test-other.

Related papers

OmniBal: Towards Fast Instruction-Tuning for Vision-Language Models via Omniverse Computation Balance [65.48009829137824]
Large-scale 3D parallel training on vision-language instruction-tuning models leads to an imbalanced computation load across different devices.<n>We rebalance the computational load from data, model, and memory perspectives, achieving more balanced computation across devices.<n>Our method's efficacy and generalizability are further validated across various models and datasets.
arXiv Detail & Related papers (2024-07-30T12:02:58Z)
ReNO: Enhancing One-step Text-to-Image Models through Reward-based Noise Optimization [59.72782742378666]
We propose Reward-based Noise Optimization (ReNO) to enhance Text-to-Image models at inference. Within a computational budget of 20-50 seconds, ReNO-enhanced one-step models consistently surpass the performance of all current open-source Text-to-Image models.
arXiv Detail & Related papers (2024-06-06T17:56:40Z)
RepCNN: Micro-sized, Mighty Models for Wakeword Detection [3.4888176891918654]
Always-on machine learning models require a very low memory and compute footprint. We show that a small convolutional model can be better trained by first its computation into a larger multi-branched architecture. We show that our always-on wake-word detector model, RepCNN, provides a good trade-off between latency and accuracy during inference.
arXiv Detail & Related papers (2024-06-04T16:14:19Z)
Anole: Adapting Diverse Compressed Models For Cross-Scene Prediction On Mobile Devices [17.542012577533015]
Anole is a light-weight scheme to cope with the local DNN model inference on mobile devices. We implement Anole on different types of mobile devices and conduct extensive trace-driven and real-world experiments based on unmanned aerial vehicles (UAVs)
arXiv Detail & Related papers (2024-05-09T12:06:18Z)
TODM: Train Once Deploy Many Efficient Supernet-Based RNN-T Compression For On-device ASR Models [30.758876520227666]
TODM is a new approach to efficiently train many sizes of hardware-friendly on-device ASR models with comparable GPU-hours to that of a single training job. We introduce a novel combination of three techniques to improve the outcomes of the TODM Supernet. Results demonstrate that our TODM Supernet either matches or surpasses the performance of manually tuned models by up to a relative of 3% better in word error rate (WER)
arXiv Detail & Related papers (2023-09-05T04:47:55Z)
Ensembling Off-the-shelf Models for GAN Training [55.34705213104182]
We find that pretrained computer vision models can significantly improve performance when used in an ensemble of discriminators. We propose an effective selection mechanism, by probing the linear separability between real and fake samples in pretrained model embeddings. Our method can improve GAN training in both limited data and large-scale settings.
arXiv Detail & Related papers (2021-12-16T18:59:50Z)
Real-time Human Detection Model for Edge Devices [0.0]
Convolutional Neural Networks (CNNs) have replaced traditional feature extraction and machine learning models in detection and classification tasks. Lightweight CNN models have been recently introduced for real-time tasks. This paper suggests a CNN-based lightweight model that can fit on a limited edge device such as Raspberry Pi.
arXiv Detail & Related papers (2021-11-20T18:42:17Z)
ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware. The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation. We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z)
Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability [46.73349163361723]
Recurrent neural network transducer (RNN-T) is a promising end-to-end (E2E) model that may replace the popular hybrid model for automatic speech recognition. We describe our recent development of RNN-T models with reduced GPU memory consumption during training. We study how to customize RNN-T models to a new domain, which is important for deploying E2E models to practical scenarios.
arXiv Detail & Related papers (2020-07-30T02:35:20Z)
A Streaming On-Device End-to-End Model Surpassing Server-Side Conventional Model Quality and Latency [88.08721721440429]
We develop a first-pass Recurrent Neural Network Transducer (RNN-T) model and a second-pass Listen, Attend, Spell (LAS) rescorer. We find that RNN-T+LAS offers a better WER and latency tradeoff compared to a conventional model.
arXiv Detail & Related papers (2020-03-28T05:00:33Z)
Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks. We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.