Related papers: RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition

RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition

URL: http://arxiv.org/abs/2002.11474v1
Date: Wed, 19 Feb 2020 00:07:32 GMT
Title: RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition
Authors: Peiyan Dong, Siyue Wang, Wei Niu, Chengming Zhang, Sheng Lin, Zhengang Li, Yifan Gong, Bin Ren, Xue Lin, Yanzhi Wang, and Dingwen Tao
Abstract summary: RTMobile is the first work that can achieve real-time RNN inference on mobile platforms. Compared with prior work on FPGA, RTMobile using Adreno 640 embedded GPU on GRU can improve the energy-efficiency by about 40$times$ while maintaining the same inference time.
Score: 51.1437598405873
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recurrent neural networks (RNNs) based automatic speech recognition has nowadays become prevalent on mobile devices such as smart phones. However, previous RNN compression techniques either suffer from hardware performance overhead due to irregularity or significant accuracy loss due to the preserved regularity for hardware friendliness. In this work, we propose RTMobile that leverages both a novel block-based pruning approach and compiler optimizations to accelerate RNN inference on mobile devices. Our proposed RTMobile is the first work that can achieve real-time RNN inference on mobile platforms. Experimental results demonstrate that RTMobile can significantly outperform existing RNN hardware acceleration methods in terms of inference accuracy and time. Compared with prior work on FPGA, RTMobile using Adreno 640 embedded GPU on GRU can improve the energy-efficiency by about 40$\times$ while maintaining the same inference time.

Related papers

Lightweight DNN for Full-Band Speech Denoising on Mobile Devices: Exploiting Long and Short Temporal Patterns [4.121578819979242]
We present a causal, low latency, and lightweight deep neural network (DNN)-based method for full-band speech denoising.<n>The method is based on a modified UNet architecture employing look-back frames, temporal spanning of convolutional kernels, and recurrent neural networks.<n>The proposed method is evaluated using established speech denoising metrics and publicly available datasets.
arXiv Detail & Related papers (2025-09-05T13:18:25Z)
GRIM: A General, Real-Time Deep Learning Inference Framework for Mobile Devices based on Fine-Grained Structured Weight Sparsity [46.75304109970339]
This paper designs a novel mobile inference acceleration framework GRIM that is General to both convolutional neural networks (CNNs) and recurrent neural networks (RNNs) We propose a new fine-grained structured sparsity scheme through the Block-based Column-Row (BCR) pruning. Based on this new fine-grained structured sparsity, our GRIM framework consists of two parts: (a) the compiler optimization and code generation for real-time mobile inference.
arXiv Detail & Related papers (2021-08-25T03:50:46Z)
On Addressing Practical Challenges for RNN-Transduce [72.72132048437751]
We adapt a well-trained RNN-T model to a new domain without collecting the audio data. We obtain word-level confidence scores by utilizing several types of features calculated during decoding. The proposed time stamping method can get less than 50ms word timing difference on average.
arXiv Detail & Related papers (2021-04-27T23:31:43Z)
Split Computing and Early Exiting for Deep Learning Applications: Survey and Research Challenges [18.103754866476088]
We provide a comprehensive survey of the state of the art in split computing (SC) and early exiting (EE) strategies. Recent approaches have been proposed, where the deep neural network is split into a head and a tail model, executed respectively on the mobile device and on the edge device. EE trains models to present multiple "exits" earlier in the architecture, each providing increasingly higher target accuracy.
arXiv Detail & Related papers (2021-03-08T01:47:20Z)
Alignment Restricted Streaming Recurrent Neural Network Transducer [29.218353627837214]
We propose a modification to the RNN-T loss function and develop Alignment Restricted RNN-T models. The Ar-RNN-T loss provides a refined control to navigate the trade-offs between the token emission delays and the Word Error Rate (WER) The Ar-RNN-T models also improve downstream applications such as the ASR End-pointing by guaranteeing token emissions within any given range of latency.
arXiv Detail & Related papers (2020-11-05T19:38:54Z)
Applying GPGPU to Recurrent Neural Network Language Model based Fast Network Search in the Real-Time LVCSR [5.0555627833288]
Recurrent Neural Network Language Models (RNNLMs) have started to be used in various fields of speech recognition. High computational complexity of RNNLMs has been a hurdle in applying the RNNLM to a real-time Large Vocabulary Continuous Speech Recognition.
arXiv Detail & Related papers (2020-07-23T05:15:14Z)
RT3D: Achieving Real-Time Execution of 3D Convolutional Neural Networks on Mobile Devices [57.877112704841366]
This paper proposes RT3D, a model compression and mobile acceleration framework for 3D CNNs. For the first time, real-time execution of 3D CNNs is achieved on off-the-shelf mobiles.
arXiv Detail & Related papers (2020-07-20T02:05:32Z)
Towards Real-Time DNN Inference on Mobile Platforms with Model Pruning and Compiler Optimization [56.3111706960878]
High-end mobile platforms serve as primary computing devices for a wide range of Deep Neural Network (DNN) applications. constrained computation and storage resources on these devices pose significant challenges for real-time inference executions. We propose a set of hardware-friendly structured model pruning and compiler optimization techniques to accelerate DNN executions on mobile devices.
arXiv Detail & Related papers (2020-04-22T03:18:23Z)
PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning [57.20262984116752]
We introduce a new dimension, fine-grained pruning patterns inside the coarse-grained structures, revealing a previously unknown point in design space. With the higher accuracy enabled by fine-grained pruning patterns, the unique insight is to use the compiler to re-gain and guarantee high hardware efficiency.
arXiv Detail & Related papers (2020-01-01T04:52:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.