Broadcasted Residual Learning for Efficient Keyword Spotting
- URL: http://arxiv.org/abs/2106.04140v4
- Date: Wed, 5 Jul 2023 15:18:54 GMT
- Title: Broadcasted Residual Learning for Efficient Keyword Spotting
- Authors: Byeonggeun Kim, Simyung Chang, Jinkyu Lee, Dooyong Sung
- Abstract summary: We present a broadcasted residual learning method to achieve high accuracy with small model size and computational load.
We also propose a novel network architecture, Broadcasting-residual network (BC-ResNet), based on broadcasted residual learning.
BC-ResNets achieve state-of-the-art 98.0% and 98.7% top-1 accuracy on Google speech command datasets v1 and v2, respectively.
- Score: 7.335747584353902
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Keyword spotting is an important research field because it plays a key role
in device wake-up and user interaction on smart devices. However, it is
challenging to minimize errors while operating efficiently in devices with
limited resources such as mobile phones. We present a broadcasted residual
learning method to achieve high accuracy with small model size and
computational load. Our method configures most of the residual functions as 1D
temporal convolution while still allows 2D convolution together using a
broadcasted-residual connection that expands temporal output to
frequency-temporal dimension. This residual mapping enables the network to
effectively represent useful audio features with much less computation than
conventional convolutional neural networks. We also propose a novel network
architecture, Broadcasting-residual network (BC-ResNet), based on broadcasted
residual learning and describe how to scale up the model according to the
target device's resources. BC-ResNets achieve state-of-the-art 98.0% and 98.7%
top-1 accuracy on Google speech command datasets v1 and v2, respectively, and
consistently outperform previous approaches, using fewer computations and
parameters. Code is available at
https://github.com/Qualcomm-AI-research/bcresnet.
Related papers
- Cascaded Temporal Updating Network for Efficient Video Super-Resolution [47.63267159007611]
Key components in recurrent-based VSR networks significantly impact model efficiency.
We propose a cascaded temporal updating network (CTUN) for efficient VSR.
CTUN achieves a favorable trade-off between efficiency and performance compared to existing methods.
arXiv Detail & Related papers (2024-08-26T12:59:32Z) - Efficient Ensemble for Multimodal Punctuation Restoration using
Time-Delay Neural Network [1.006218778776515]
Punctuation restoration plays an essential role in the post-processing procedure of automatic speech recognition.
We present EfficientPunct, an ensemble method with a multimodal time-delay neural network.
It outperforms the current best model by 1.0 F1 points, using less than a tenth of its inference network parameters.
arXiv Detail & Related papers (2023-02-26T18:28:20Z) - Efficient Parametric Approximations of Neural Network Function Space
Distance [6.117371161379209]
It is often useful to compactly summarize important properties of model parameters and training data so that they can be used later without storing and/or iterating over the entire dataset.
We consider estimating the Function Space Distance (FSD) over a training set, i.e. the average discrepancy between the outputs of two neural networks.
We propose a Linearized Activation TRick (LAFTR) and derive an efficient approximation to FSD for ReLU neural networks.
arXiv Detail & Related papers (2023-02-07T15:09:23Z) - NAF: Neural Attenuation Fields for Sparse-View CBCT Reconstruction [79.13750275141139]
This paper proposes a novel and fast self-supervised solution for sparse-view CBCT reconstruction.
The desired attenuation coefficients are represented as a continuous function of 3D spatial coordinates, parameterized by a fully-connected deep neural network.
A learning-based encoder entailing hash coding is adopted to help the network capture high-frequency details.
arXiv Detail & Related papers (2022-09-29T04:06:00Z) - A Temporal-oriented Broadcast ResNet for COVID-19 Detection [11.306011762214272]
We present a temporal-oriented broadcasting residual learning method that achieves efficient computation and high accuracy with a small model size.
Based on the EfficientNet architecture, our novel network, named Temporal-oriented ResNet(TorNet), constitutes of a broadcasting learning block.
With the AB Block, the network obtains useful audio-temporal features and higher level embeddings effectively with much less computation than Recurrent Neural Networks(RNNs)
arXiv Detail & Related papers (2022-03-31T13:11:57Z) - a novel attention-based network for fast salient object detection [14.246237737452105]
In the current salient object detection network, the most popular method is using U-shape structure.
We propose a new deep convolution network architecture with three contributions.
Results demonstrate that the proposed method can compress the model to 1/3 of the original size nearly without losing the accuracy.
arXiv Detail & Related papers (2021-12-20T12:30:20Z) - SignalNet: A Low Resolution Sinusoid Decomposition and Estimation
Network [79.04274563889548]
We propose SignalNet, a neural network architecture that detects the number of sinusoids and estimates their parameters from quantized in-phase and quadrature samples.
We introduce a worst-case learning threshold for comparing the results of our network relative to the underlying data distributions.
In simulation, we find that our algorithm is always able to surpass the threshold for three-bit data but often cannot exceed the threshold for one-bit data.
arXiv Detail & Related papers (2021-06-10T04:21:20Z) - Enabling certification of verification-agnostic networks via
memory-efficient semidefinite programming [97.40955121478716]
We propose a first-order dual SDP algorithm that requires memory only linear in the total number of network activations.
We significantly improve L-inf verified robust accuracy from 1% to 88% and 6% to 40% respectively.
We also demonstrate tight verification of a quadratic stability specification for the decoder of a variational autoencoder.
arXiv Detail & Related papers (2020-10-22T12:32:29Z) - Fast accuracy estimation of deep learning based multi-class musical
source separation [79.10962538141445]
We propose a method to evaluate the separability of instruments in any dataset without training and tuning a neural network.
Based on the oracle principle with an ideal ratio mask, our approach is an excellent proxy to estimate the separation performances of state-of-the-art deep learning approaches.
arXiv Detail & Related papers (2020-10-19T13:05:08Z) - Efficient Integer-Arithmetic-Only Convolutional Neural Networks [87.01739569518513]
We replace conventional ReLU with Bounded ReLU and find that the decline is due to activation quantization.
Our integer networks achieve equivalent performance as the corresponding FPN networks, but have only 1/4 memory cost and run 2x faster on modern GPU.
arXiv Detail & Related papers (2020-06-21T08:23:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.