Dynamic Sparsity Neural Networks for Automatic Speech Recognition
- URL: http://arxiv.org/abs/2005.10627v3
- Date: Mon, 8 Feb 2021 08:01:58 GMT
- Title: Dynamic Sparsity Neural Networks for Automatic Speech Recognition
- Authors: Zhaofeng Wu, Ding Zhao, Qiao Liang, Jiahui Yu, Anmol Gulati, Ruoming
Pang
- Abstract summary: We present Dynamic Sparsity Neural Networks (DSNN) that, once trained, can instantly switch to any predefined sparsity configuration at run-time.
Our trained DSNN model, therefore, can greatly ease the training process and simplify deployment in diverse scenarios with resource constraints.
- Score: 44.352231175123215
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In automatic speech recognition (ASR), model pruning is a widely adopted
technique that reduces model size and latency to deploy neural network models
on edge devices with resource constraints. However, multiple models with
different sparsity levels usually need to be separately trained and deployed to
heterogeneous target hardware with different resource specifications and for
applications that have various latency requirements. In this paper, we present
Dynamic Sparsity Neural Networks (DSNN) that, once trained, can instantly
switch to any predefined sparsity configuration at run-time. We demonstrate the
effectiveness and flexibility of DSNN using experiments on internal production
datasets with Google Voice Search data, and show that the performance of a DSNN
model is on par with that of individually trained single sparsity networks. Our
trained DSNN model, therefore, can greatly ease the training process and
simplify deployment in diverse scenarios with resource constraints.
Related papers
- Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch [72.26822499434446]
Auto-Train-Once (ATO) is an innovative network pruning algorithm designed to automatically reduce the computational and storage costs of DNNs.
We provide a comprehensive convergence analysis as well as extensive experiments, and the results show that our approach achieves state-of-the-art performance across various model architectures.
arXiv Detail & Related papers (2024-03-21T02:33:37Z) - Quantization-aware Neural Architectural Search for Intrusion Detection [5.010685611319813]
We present a design methodology that automatically trains and evolves quantized neural network (NN) models that are a thousand times smaller than state-of-the-art NNs.
The number of LUTs utilized by this network when deployed to an FPGA is between 2.3x and 8.5x smaller with performance comparable to prior work.
arXiv Detail & Related papers (2023-11-07T18:35:29Z) - Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse
Multi-DNN Workloads [65.47816359465155]
Running multiple deep neural networks (DNNs) in parallel has become an emerging workload in both edge devices.
We propose Dysta, a novel scheduler that utilizes both static sparsity patterns and dynamic sparsity information for the sparse multi-DNN scheduling.
Our proposed approach outperforms the state-of-the-art methods with up to 10% decrease in latency constraint violation rate and nearly 4X reduction in average normalized turnaround time.
arXiv Detail & Related papers (2023-10-17T09:25:17Z) - SortedNet: A Scalable and Generalized Framework for Training Modular Deep Neural Networks [30.069353400127046]
We propose SortedNet to harness the inherent modularity of deep neural networks (DNNs)
SortedNet enables the training of sub-models simultaneously along with the training of the main model.
It is able to train 160 sub-models at once, achieving at least 96% of the original model's performance.
arXiv Detail & Related papers (2023-09-01T05:12:25Z) - How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Model Blending for Text Classification [0.15229257192293197]
We try reducing the complexity of state of the art LSTM models for natural language tasks such as text classification, by distilling their knowledge to CNN based models, thus reducing the inference time(or latency) during testing.
arXiv Detail & Related papers (2022-08-05T05:07:45Z) - YONO: Modeling Multiple Heterogeneous Neural Networks on
Microcontrollers [10.420617367363047]
YONO is a product quantization (PQ) based approach that compresses multiple heterogeneous models and enables in-memory model execution and switching.
YONO shows remarkable performance as it can compress multiple heterogeneous models with negligible or no loss of accuracy up to 12.37$times$.
arXiv Detail & Related papers (2022-03-08T01:24:36Z) - NL-CNN: A Resources-Constrained Deep Learning Model based on Nonlinear
Convolution [0.0]
A novel convolution neural network model, abbreviated NL-CNN, is proposed, where nonlinear convolution is emulated in a cascade of convolution + nonlinearity layers.
Performance evaluation for several widely known datasets is provided, showing several relevant features.
arXiv Detail & Related papers (2021-01-30T13:38:42Z) - Progressive Tandem Learning for Pattern Recognition with Deep Spiking
Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency.
We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.