Hybrid Transformer/CTC Networks for Hardware Efficient Voice Triggering
- URL: http://arxiv.org/abs/2008.02323v1
- Date: Wed, 5 Aug 2020 19:16:33 GMT
- Title: Hybrid Transformer/CTC Networks for Hardware Efficient Voice Triggering
- Authors: Saurabh Adya, Vineet Garg, Siddharth Sigtia, Pramod Simha, Chandra
Dhir
- Abstract summary: We consider the design of two-pass voice trigger detection systems.
We focus on the networks in the second pass that are used to re-score candidate segments.
- Score: 8.103294902922036
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the design of two-pass voice trigger detection systems. We focus
on the networks in the second pass that are used to re-score candidate segments
obtained from the first-pass. Our baseline is an acoustic model(AM), with
BiLSTM layers, trained by minimizing the CTC loss. We replace the BiLSTM layers
with self-attention layers. Results on internal evaluation sets show that
self-attention networks yield better accuracy while requiring fewer parameters.
We add an auto-regressive decoder network on top of the self-attention layers
and jointly minimize the CTC loss on the encoder and the cross-entropy loss on
the decoder. This design yields further improvements over the baseline. We
retrain all the models above in a multi-task learning(MTL) setting, where one
branch of a shared network is trained as an AM, while the second branch
classifies the whole sequence to be true-trigger or not. Results demonstrate
that networks with self-attention layers yield $\sim$60% relative reduction in
false reject rates for a given false-alarm rate, while requiring 10% fewer
parameters. When trained in the MTL setup, self-attention networks yield
further accuracy improvements. On-device measurements show that we observe 70%
relative reduction in inference time. Additionally, the proposed network
architectures are $\sim$5X faster to train.
Related papers
- Kronecker-Factored Approximate Curvature for Modern Neural Network
Architectures [85.76673783330334]
Two different settings of linear weight-sharing layers motivate two flavours of Kronecker-Factored Approximate Curvature (K-FAC)
We show they are exact for deep linear networks with weight-sharing in their respective setting.
We observe little difference between these two K-FAC variations when using them to train both a graph neural network and a vision transformer.
arXiv Detail & Related papers (2023-11-01T16:37:00Z) - Sharpness-Aware Minimization Leads to Low-Rank Features [49.64754316927016]
Sharpness-aware minimization (SAM) is a recently proposed method that minimizes the training loss of a neural network.
We show that SAM reduces the feature rank which happens at different layers of a neural network.
We confirm this effect theoretically and check that it can also occur in deep networks.
arXiv Detail & Related papers (2023-05-25T17:46:53Z) - Low PAPR MIMO-OFDM Design Based on Convolutional Autoencoder [20.544993155126967]
A new framework for peak-to-average power ratio ($mathsfPAPR$) reduction and waveform design is presented.
A convolutional-autoencoder ($mathsfCAE$) architecture is presented.
We show that a single trained model covers the tasks of $mathsfPAPR$ reduction, spectrum design, and $mathsfMIMO$ detection together over a wide range of SNR levels.
arXiv Detail & Related papers (2023-01-11T11:35:10Z) - CATRO: Channel Pruning via Class-Aware Trace Ratio Optimization [61.71504948770445]
We propose a novel channel pruning method via Class-Aware Trace Ratio Optimization (CATRO) to reduce the computational burden and accelerate the model inference.
We show that CATRO achieves higher accuracy with similar cost or lower cost with similar accuracy than other state-of-the-art channel pruning algorithms.
Because of its class-aware property, CATRO is suitable to prune efficient networks adaptively for various classification subtasks, enhancing handy deployment and usage of deep networks in real-world applications.
arXiv Detail & Related papers (2021-10-21T06:26:31Z) - Tied & Reduced RNN-T Decoder [0.0]
We study ways to make the RNN-T decoder (prediction network + joint network) smaller and faster without degradation in recognition performance.
Our prediction network performs a simple weighted averaging of the input embeddings, and shares its embedding matrix weights with the joint network's output layer.
This simple design, when used in conjunction with additional Edit-based Minimum Bayes Risk (EMBR) training, reduces the RNN-T Decoder from 23M parameters to just 2M, without affecting word-error rate (WER)
arXiv Detail & Related papers (2021-09-15T18:19:16Z) - Train your classifier first: Cascade Neural Networks Training from upper
layers to lower layers [54.47911829539919]
We develop a novel top-down training method which can be viewed as an algorithm for searching for high-quality classifiers.
We tested this method on automatic speech recognition (ASR) tasks and language modelling tasks.
The proposed method consistently improves recurrent neural network ASR models on Wall Street Journal, self-attention ASR models on Switchboard, and AWD-LSTM language models on WikiText-2.
arXiv Detail & Related papers (2021-02-09T08:19:49Z) - Enabling certification of verification-agnostic networks via
memory-efficient semidefinite programming [97.40955121478716]
We propose a first-order dual SDP algorithm that requires memory only linear in the total number of network activations.
We significantly improve L-inf verified robust accuracy from 1% to 88% and 6% to 40% respectively.
We also demonstrate tight verification of a quadratic stability specification for the decoder of a variational autoencoder.
arXiv Detail & Related papers (2020-10-22T12:32:29Z) - Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs.
Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.