Accelerator-Aware Training for Transducer-Based Speech Recognition
- URL: http://arxiv.org/abs/2305.07778v1
- Date: Fri, 12 May 2023 21:49:51 GMT
- Title: Accelerator-Aware Training for Transducer-Based Speech Recognition
- Authors: Suhaila M. Shakiah, Rupak Vignesh Swaminathan, Hieu Duy Nguyen,
Raviteja Chinta, Tariq Afzal, Nathan Susanj, Athanasios Mouchtaris, Grant P.
Strimel, Ariya Rastrow
- Abstract summary: In this work, we replicate the NNA operators during the training phase, accounting for the degradation due to low-precision inference on the NNA in back-propagation.
Our proposed method efficiently emulates NNA operations, thus foregoing the need to transfer quantization error-prone data to the CPU.
We train and evaluate models on 270K hours of English data and show a 5-7% improvement in engine latency while saving up to 10% relative degradation in WER.
- Score: 16.959329474794092
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Machine learning model weights and activations are represented in
full-precision during training. This leads to performance degradation in
runtime when deployed on neural network accelerator (NNA) chips, which leverage
highly parallelized fixed-point arithmetic to improve runtime memory and
latency. In this work, we replicate the NNA operators during the training
phase, accounting for the degradation due to low-precision inference on the NNA
in back-propagation. Our proposed method efficiently emulates NNA operations,
thus foregoing the need to transfer quantization error-prone data to the
Central Processing Unit (CPU), ultimately reducing the user perceived latency
(UPL). We apply our approach to Recurrent Neural Network-Transducer (RNN-T), an
attractive architecture for on-device streaming speech recognition tasks. We
train and evaluate models on 270K hours of English data and show a 5-7%
improvement in engine latency while saving up to 10% relative degradation in
WER.
Related papers
- EPIM: Efficient Processing-In-Memory Accelerators based on Epitome [78.79382890789607]
We introduce the Epitome, a lightweight neural operator offering convolution-like functionality.
On the software side, we evaluate epitomes' latency and energy on PIM accelerators.
We introduce a PIM-aware layer-wise design method to enhance their hardware efficiency.
arXiv Detail & Related papers (2023-11-12T17:56:39Z) - Solving Large-scale Spatial Problems with Convolutional Neural Networks [88.31876586547848]
We employ transfer learning to improve training efficiency for large-scale spatial problems.
We propose that a convolutional neural network (CNN) can be trained on small windows of signals, but evaluated on arbitrarily large signals with little to no performance degradation.
arXiv Detail & Related papers (2023-06-14T01:24:42Z) - Towards Memory- and Time-Efficient Backpropagation for Training Spiking
Neural Networks [70.75043144299168]
Spiking Neural Networks (SNNs) are promising energy-efficient models for neuromorphic computing.
We propose the Spatial Learning Through Time (SLTT) method that can achieve high performance while greatly improving training efficiency.
Our method achieves state-of-the-art accuracy on ImageNet, while the memory cost and training time are reduced by more than 70% and 50%, respectively, compared with BPTT.
arXiv Detail & Related papers (2023-02-28T05:01:01Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Online Training Through Time for Spiking Neural Networks [66.7744060103562]
Spiking neural networks (SNNs) are promising brain-inspired energy-efficient models.
Recent progress in training methods has enabled successful deep SNNs on large-scale tasks with low latency.
We propose online training through time (OTTT) for SNNs, which is derived from BPTT to enable forward-in-time learning.
arXiv Detail & Related papers (2022-10-09T07:47:56Z) - Learning in Feedback-driven Recurrent Spiking Neural Networks using
full-FORCE Training [4.124948554183487]
We propose a supervised training procedure for RSNNs, where a second network is introduced only during the training.
The proposed training procedure consists of generating targets for both recurrent and readout layers.
We demonstrate the improved performance and noise robustness of the proposed full-FORCE training procedure to model 8 dynamical systems.
arXiv Detail & Related papers (2022-05-26T19:01:19Z) - Enabling Incremental Training with Forward Pass for Edge Devices [0.0]
We introduce a method using evolutionary strategy (ES) that can partially retrain the network enabling it to adapt to changes and recover after an error has occurred.
This technique enables training on an inference-only hardware without the need to use backpropagation and with minimal resource overhead.
arXiv Detail & Related papers (2021-03-25T17:43:04Z) - FracTrain: Fractionally Squeezing Bit Savings Both Temporally and
Spatially for Efficient DNN Training [81.85361544720885]
We propose FracTrain that integrates progressive fractional quantization which gradually increases the precision of activations, weights, and gradients.
FracTrain reduces computational cost and hardware-quantified energy/latency of DNN training while achieving a comparable or better (-0.12%+1.87%) accuracy.
arXiv Detail & Related papers (2020-12-24T05:24:10Z) - Dynamic Hard Pruning of Neural Networks at the Edge of the Internet [11.605253906375424]
Dynamic Hard Pruning (DynHP) technique incrementally prunes the network during training.
DynHP enables a tunable size reduction of the final neural network and reduces the NN memory occupancy during training.
Freed memory is reused by a emphdynamic batch sizing approach to counterbalance the accuracy degradation caused by the hard pruning strategy.
arXiv Detail & Related papers (2020-11-17T10:23:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.