RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech
Recognition
- URL: http://arxiv.org/abs/2002.11474v1
- Date: Wed, 19 Feb 2020 00:07:32 GMT
- Title: RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech
Recognition
- Authors: Peiyan Dong, Siyue Wang, Wei Niu, Chengming Zhang, Sheng Lin, Zhengang
Li, Yifan Gong, Bin Ren, Xue Lin, Yanzhi Wang, and Dingwen Tao
- Abstract summary: RTMobile is the first work that can achieve real-time RNN inference on mobile platforms.
Compared with prior work on FPGA, RTMobile using Adreno 640 embedded GPU on GRU can improve the energy-efficiency by about 40$times$ while maintaining the same inference time.
- Score: 51.1437598405873
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recurrent neural networks (RNNs) based automatic speech recognition has
nowadays become prevalent on mobile devices such as smart phones. However,
previous RNN compression techniques either suffer from hardware performance
overhead due to irregularity or significant accuracy loss due to the preserved
regularity for hardware friendliness. In this work, we propose RTMobile that
leverages both a novel block-based pruning approach and compiler optimizations
to accelerate RNN inference on mobile devices. Our proposed RTMobile is the
first work that can achieve real-time RNN inference on mobile platforms.
Experimental results demonstrate that RTMobile can significantly outperform
existing RNN hardware acceleration methods in terms of inference accuracy and
time. Compared with prior work on FPGA, RTMobile using Adreno 640 embedded GPU
on GRU can improve the energy-efficiency by about 40$\times$ while maintaining
the same inference time.
Related papers
- Lightweight DNN for Full-Band Speech Denoising on Mobile Devices: Exploiting Long and Short Temporal Patterns [4.121578819979242]
We present a causal, low latency, and lightweight deep neural network (DNN)-based method for full-band speech denoising.<n>The method is based on a modified UNet architecture employing look-back frames, temporal spanning of convolutional kernels, and recurrent neural networks.<n>The proposed method is evaluated using established speech denoising metrics and publicly available datasets.
arXiv Detail & Related papers (2025-09-05T13:18:25Z) - GRIM: A General, Real-Time Deep Learning Inference Framework for Mobile
Devices based on Fine-Grained Structured Weight Sparsity [46.75304109970339]
This paper designs a novel mobile inference acceleration framework GRIM that is General to both convolutional neural networks (CNNs) and recurrent neural networks (RNNs)
We propose a new fine-grained structured sparsity scheme through the Block-based Column-Row (BCR) pruning.
Based on this new fine-grained structured sparsity, our GRIM framework consists of two parts: (a) the compiler optimization and code generation for real-time mobile inference.
arXiv Detail & Related papers (2021-08-25T03:50:46Z) - On Addressing Practical Challenges for RNN-Transduce [72.72132048437751]
We adapt a well-trained RNN-T model to a new domain without collecting the audio data.
We obtain word-level confidence scores by utilizing several types of features calculated during decoding.
The proposed time stamping method can get less than 50ms word timing difference on average.
arXiv Detail & Related papers (2021-04-27T23:31:43Z) - Split Computing and Early Exiting for Deep Learning Applications: Survey
and Research Challenges [18.103754866476088]
We provide a comprehensive survey of the state of the art in split computing (SC) and early exiting (EE) strategies.
Recent approaches have been proposed, where the deep neural network is split into a head and a tail model, executed respectively on the mobile device and on the edge device.
EE trains models to present multiple "exits" earlier in the architecture, each providing increasingly higher target accuracy.
arXiv Detail & Related papers (2021-03-08T01:47:20Z) - Alignment Restricted Streaming Recurrent Neural Network Transducer [29.218353627837214]
We propose a modification to the RNN-T loss function and develop Alignment Restricted RNN-T models.
The Ar-RNN-T loss provides a refined control to navigate the trade-offs between the token emission delays and the Word Error Rate (WER)
The Ar-RNN-T models also improve downstream applications such as the ASR End-pointing by guaranteeing token emissions within any given range of latency.
arXiv Detail & Related papers (2020-11-05T19:38:54Z) - Applying GPGPU to Recurrent Neural Network Language Model based Fast
Network Search in the Real-Time LVCSR [5.0555627833288]
Recurrent Neural Network Language Models (RNNLMs) have started to be used in various fields of speech recognition.
High computational complexity of RNNLMs has been a hurdle in applying the RNNLM to a real-time Large Vocabulary Continuous Speech Recognition.
arXiv Detail & Related papers (2020-07-23T05:15:14Z) - RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks
on Mobile Devices [57.877112704841366]
This paper proposes RT3D, a model compression and mobile acceleration framework for 3D CNNs.
For the first time, real-time execution of 3D CNNs is achieved on off-the-shelf mobiles.
arXiv Detail & Related papers (2020-07-20T02:05:32Z) - Towards Real-Time DNN Inference on Mobile Platforms with Model Pruning
and Compiler Optimization [56.3111706960878]
High-end mobile platforms serve as primary computing devices for a wide range of Deep Neural Network (DNN) applications.
constrained computation and storage resources on these devices pose significant challenges for real-time inference executions.
We propose a set of hardware-friendly structured model pruning and compiler optimization techniques to accelerate DNN executions on mobile devices.
arXiv Detail & Related papers (2020-04-22T03:18:23Z) - PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with
Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space.
With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.