RepCNN: Micro-sized, Mighty Models for Wakeword Detection
- URL: http://arxiv.org/abs/2406.02652v2
- Date: Thu, 1 Aug 2024 22:39:20 GMT
- Title: RepCNN: Micro-sized, Mighty Models for Wakeword Detection
- Authors: Arnav Kundu, Prateeth Nayak, Priyanka Padmanabhan, Devang Naik,
- Abstract summary: Always-on machine learning models require a very low memory and compute footprint.
We show that a small convolutional model can be better trained by first its computation into a larger multi-branched architecture.
We show that our always-on wake-word detector model, RepCNN, provides a good trade-off between latency and accuracy during inference.
- Score: 3.4888176891918654
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Always-on machine learning models require a very low memory and compute footprint. Their restricted parameter count limits the model's capacity to learn, and the effectiveness of the usual training algorithms to find the best parameters. Here we show that a small convolutional model can be better trained by first refactoring its computation into a larger redundant multi-branched architecture. Then, for inference, we algebraically re-parameterize the trained model into the single-branched form with fewer parameters for a lower memory footprint and compute cost. Using this technique, we show that our always-on wake-word detector model, RepCNN, provides a good trade-off between latency and accuracy during inference. RepCNN re-parameterized models are 43% more accurate than a uni-branch convolutional model while having the same runtime. RepCNN also meets the accuracy of complex architectures like BC-ResNet, while having 2x lesser peak memory usage and 10x faster runtime.
Related papers
- MPruner: Optimizing Neural Network Size with CKA-Based Mutual Information Pruning [7.262751938473306]
Pruning is a well-established technique that reduces the size of neural networks while mathematically guaranteeing accuracy preservation.
We develop a new pruning algorithm, MPruner, that leverages mutual information through vector similarity.
MPruner achieved up to a 50% reduction in parameters and memory usage for CNN and transformer-based models, with minimal to no loss in accuracy.
arXiv Detail & Related papers (2024-08-24T05:54:47Z) - Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler [34.416299887009195]
We study the correlation between optimal learning rate, batch size, and number of training tokens for the recently proposed WSD scheduler.
We propose a new learning rate scheduler, Power scheduler, that is agnostic about the number of training tokens and batch size.
Our 3B dense and MoE models trained with the Power scheduler achieve comparable performance as state-of-the-art small language models.
arXiv Detail & Related papers (2024-08-23T20:22:20Z) - Reinforcement Learning with Fast and Forgetful Memory [10.087126455388276]
We introduce Fast and Forgetful Memory, an algorithm-agnostic memory model designed specifically for Reinforcement Learning (RL)
Our approach constrains the model search space via strong structural priors inspired by computational psychology.
Fast and Forgetful Memory exhibits training speeds two orders of magnitude faster than recurrent neural networks (RNNs)
arXiv Detail & Related papers (2023-10-06T09:56:26Z) - Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model [89.8764435351222]
We propose a new family of unbiased estimators called WTA-CRS, for matrix production with reduced variance.
Our work provides both theoretical and experimental evidence that, in the context of tuning transformers, our proposed estimators exhibit lower variance compared to existing ones.
arXiv Detail & Related papers (2023-05-24T15:52:08Z) - Re-parameterizing Your Optimizers rather than Architectures [119.08740698936633]
We propose a novel paradigm of incorporating model-specific prior knowledge into Structurals and using them to train generic (simple) models.
As an implementation, we propose a novel methodology to add prior knowledge by modifying the gradients according to a set of model-specific hyper- parameters.
For a simple model trained with a Repr, we focus on a VGG-style plain model and showcase that such a simple model trained with a Repr, which is referred to as Rep-VGG, performs on par with the recent well-designed models.
arXiv Detail & Related papers (2022-05-30T16:55:59Z) - Omni-sparsity DNN: Fast Sparsity Optimization for On-Device Streaming
E2E ASR via Supernet [24.62661549442265]
We propose Omni-sparsity DNN, where a single neural network can be pruned to generate optimized model for a large range of model sizes.
Our results show great saving on training time and resources with similar or better accuracy on LibriSpeech compared to individually pruned models.
arXiv Detail & Related papers (2021-10-15T20:28:27Z) - MoEfication: Conditional Computation of Transformer Models for Efficient
Inference [66.56994436947441]
Transformer-based pre-trained language models can achieve superior performance on most NLP tasks due to large parameter capacity, but also lead to huge computation cost.
We explore to accelerate large-model inference by conditional computation based on the sparse activation phenomenon.
We propose to transform a large model into its mixture-of-experts (MoE) version with equal model size, namely MoEfication.
arXiv Detail & Related papers (2021-10-05T02:14:38Z) - ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked
Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware.
The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation.
We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z) - Training Deep Neural Networks with Constrained Learning Parameters [4.917317902787792]
A significant portion of deep learning tasks would run on edge computing systems.
We propose the Combinatorial Neural Network Training Algorithm (CoNNTrA)
CoNNTrA trains deep learning models with ternary learning parameters on the MNIST, Iris and ImageNet data sets.
Our results indicate that CoNNTrA models use 32x less memory and have errors at par with the Backpropagation models.
arXiv Detail & Related papers (2020-09-01T16:20:11Z) - The Right Tool for the Job: Matching Model and Instance Complexities [62.95183777679024]
As NLP models become larger, executing a trained model requires significant computational resources incurring monetary and environmental costs.
We propose a modification to contextual representation fine-tuning which, during inference, allows for an early (and fast) "exit"
We test our proposed modification on five different datasets in two tasks: three text classification datasets and two natural language inference benchmarks.
arXiv Detail & Related papers (2020-04-16T04:28:08Z) - Recognizing Long Grammatical Sequences Using Recurrent Networks
Augmented With An External Differentiable Stack [73.48927855855219]
Recurrent neural networks (RNNs) are a widely used deep architecture for sequence modeling, generation, and prediction.
RNNs generalize poorly over very long sequences, which limits their applicability to many important temporal processing and time series forecasting problems.
One way to address these shortcomings is to couple an RNN with an external, differentiable memory structure, such as a stack.
In this paper, we improve the memory-augmented RNN with important architectural and state updating mechanisms.
arXiv Detail & Related papers (2020-04-04T14:19:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.