Related papers: Hybrid Transformer/CTC Networks for Hardware Efficient Voice Triggering

Hybrid Transformer/CTC Networks for Hardware Efficient Voice Triggering

URL: http://arxiv.org/abs/2008.02323v1
Date: Wed, 5 Aug 2020 19:16:33 GMT
Title: Hybrid Transformer/CTC Networks for Hardware Efficient Voice Triggering
Authors: Saurabh Adya, Vineet Garg, Siddharth Sigtia, Pramod Simha, Chandra Dhir
Abstract summary: We consider the design of two-pass voice trigger detection systems. We focus on the networks in the second pass that are used to re-score candidate segments.
Score: 8.103294902922036
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the design of two-pass voice trigger detection systems. We focus on the networks in the second pass that are used to re-score candidate segments obtained from the first-pass. Our baseline is an acoustic model(AM), with BiLSTM layers, trained by minimizing the CTC loss. We replace the BiLSTM layers with self-attention layers. Results on internal evaluation sets show that self-attention networks yield better accuracy while requiring fewer parameters. We add an auto-regressive decoder network on top of the self-attention layers and jointly minimize the CTC loss on the encoder and the cross-entropy loss on the decoder. This design yields further improvements over the baseline. We retrain all the models above in a multi-task learning(MTL) setting, where one branch of a shared network is trained as an AM, while the second branch classifies the whole sequence to be true-trigger or not. Results demonstrate that networks with self-attention layers yield $\sim$60% relative reduction in false reject rates for a given false-alarm rate, while requiring 10% fewer parameters. When trained in the MTL setup, self-attention networks yield further accuracy improvements. On-device measurements show that we observe 70% relative reduction in inference time. Additionally, the proposed network architectures are $\sim$5X faster to train.

Related papers

Kronecker-Factored Approximate Curvature for Modern Neural Network Architectures [85.76673783330334]
Two different settings of linear weight-sharing layers motivate two flavours of Kronecker-Factored Approximate Curvature (K-FAC) We show they are exact for deep linear networks with weight-sharing in their respective setting. We observe little difference between these two K-FAC variations when using them to train both a graph neural network and a vision transformer.
arXiv Detail & Related papers (2023-11-01T16:37:00Z)
Sharpness-Aware Minimization Leads to Low-Rank Features [49.64754316927016]
Sharpness-aware minimization (SAM) is a recently proposed method that minimizes the training loss of a neural network. We show that SAM reduces the feature rank which happens at different layers of a neural network. We confirm this effect theoretically and check that it can also occur in deep networks.
arXiv Detail & Related papers (2023-05-25T17:46:53Z)
Low PAPR MIMO-OFDM Design Based on Convolutional Autoencoder [20.544993155126967]
A new framework for peak-to-average power ratio ($mathsfPAPR$) reduction and waveform design is presented. A convolutional-autoencoder ($mathsfCAE$) architecture is presented. We show that a single trained model covers the tasks of $mathsfPAPR$ reduction, spectrum design, and $mathsfMIMO$ detection together over a wide range of SNR levels.
arXiv Detail & Related papers (2023-01-11T11:35:10Z)
CATRO: Channel Pruning via Class-Aware Trace Ratio Optimization [61.71504948770445]
We propose a novel channel pruning method via Class-Aware Trace Ratio Optimization (CATRO) to reduce the computational burden and accelerate the model inference. We show that CATRO achieves higher accuracy with similar cost or lower cost with similar accuracy than other state-of-the-art channel pruning algorithms. Because of its class-aware property, CATRO is suitable to prune efficient networks adaptively for various classification subtasks, enhancing handy deployment and usage of deep networks in real-world applications.
arXiv Detail & Related papers (2021-10-21T06:26:31Z)
Tied & Reduced RNN-T Decoder [0.0]
We study ways to make the RNN-T decoder (prediction network + joint network) smaller and faster without degradation in recognition performance. Our prediction network performs a simple weighted averaging of the input embeddings, and shares its embedding matrix weights with the joint network's output layer. This simple design, when used in conjunction with additional Edit-based Minimum Bayes Risk (EMBR) training, reduces the RNN-T Decoder from 23M parameters to just 2M, without affecting word-error rate (WER)
arXiv Detail & Related papers (2021-09-15T18:19:16Z)
Train your classifier first: Cascade Neural Networks Training from upper layers to lower layers [54.47911829539919]
We develop a novel top-down training method which can be viewed as an algorithm for searching for high-quality classifiers. We tested this method on automatic speech recognition (ASR) tasks and language modelling tasks. The proposed method consistently improves recurrent neural network ASR models on Wall Street Journal, self-attention ASR models on Switchboard, and AWD-LSTM language models on WikiText-2.
arXiv Detail & Related papers (2021-02-09T08:19:49Z)
Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming [97.40955121478716]
We propose a first-order dual SDP algorithm that requires memory only linear in the total number of network activations. We significantly improve L-inf verified robust accuracy from 1% to 88% and 6% to 40% respectively. We also demonstrate tight verification of a quadratic stability specification for the decoder of a variational autoencoder.
arXiv Detail & Related papers (2020-10-22T12:32:29Z)
Network Adjustment: Channel Search Guided by FLOPs Utilization Ratio [101.84651388520584]
This paper presents a new framework named network adjustment, which considers network accuracy as a function of FLOPs. Experiments on standard image classification datasets and a wide range of base networks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2020-04-06T15:51:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.