Sharing Low Rank Conformer Weights for Tiny Always-On Ambient Speech
Recognition Models
- URL: http://arxiv.org/abs/2303.08343v1
- Date: Wed, 15 Mar 2023 03:21:38 GMT
- Title: Sharing Low Rank Conformer Weights for Tiny Always-On Ambient Speech
Recognition Models
- Authors: Steven M. Hernandez, Ding Zhao, Shaojin Ding, Antoine Bruguier, Rohit
Prabhavalkar, Tara N. Sainath, Yanzhang He, Ian McGraw
- Abstract summary: We consider methods to reduce the model size of Conformer-based speech recognition models.
Such a model allows us to achieve always-on ambient speech recognition on edge devices with low-memory neural processors.
- Score: 47.99478573698432
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Continued improvements in machine learning techniques offer exciting new
opportunities through the use of larger models and larger training datasets.
However, there is a growing need to offer these new capabilities on-board
low-powered devices such as smartphones, wearables and other embedded
environments where only low memory is available. Towards this, we consider
methods to reduce the model size of Conformer-based speech recognition models
which typically require models with greater than 100M parameters down to just
$5$M parameters while minimizing impact on model quality. Such a model allows
us to achieve always-on ambient speech recognition on edge devices with
low-memory neural processors. We propose model weight reuse at different levels
within our model architecture: (i) repeating full conformer block layers, (ii)
sharing specific conformer modules across layers, (iii) sharing sub-components
per conformer module, and (iv) sharing decomposed sub-component weights after
low-rank decomposition. By sharing weights at different levels of our model, we
can retain the full model in-memory while increasing the number of virtual
transformations applied to the input. Through a series of ablation studies and
evaluations, we find that with weight sharing and a low-rank architecture, we
can achieve a WER of 2.84 and 2.94 for Librispeech dev-clean and test-clean
respectively with a $5$M parameter model.
Related papers
- Neural Metamorphosis [72.88137795439407]
This paper introduces a new learning paradigm termed Neural Metamorphosis (NeuMeta), which aims to build self-morphable neural networks.
NeuMeta directly learns the continuous weight manifold of neural networks.
It sustains full-size performance even at a 75% compression rate.
arXiv Detail & Related papers (2024-10-10T14:49:58Z) - EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - MatFormer: Nested Transformer for Elastic Inference [94.1789252941718]
MatFormer is a nested Transformer architecture designed to offer elasticity in a variety of deployment constraints.
We show that a 2.6B decoder-only MatFormer language model (MatLM) allows us to extract smaller models spanning from 1.5B to 2.6B.
We also observe that smaller encoders extracted from a universal MatFormer-based ViT (MatViT) encoder preserve the metric-space structure for adaptive large-scale retrieval.
arXiv Detail & Related papers (2023-10-11T17:57:14Z) - Meta-Ensemble Parameter Learning [35.6391802164328]
In this paper, we study if we can utilize the meta-learning strategy to directly predict the parameters of a single model with comparable performance of an ensemble.
We introduce WeightFormer, a Transformer-based model that can predict student network weights layer by layer in a forward pass.
arXiv Detail & Related papers (2022-10-05T00:47:24Z) - Multi-stage Progressive Compression of Conformer Transducer for
On-device Speech Recognition [7.450574974954803]
Small memory bandwidth in smart devices prompts development of smaller Automatic Speech Recognition (ASR) models.
Knowledge distillation (KD) is a popular model compression approach that has shown to achieve smaller model size.
We propose a multi-stage progressive approach to compress the conformer transducer model using KD.
arXiv Detail & Related papers (2022-10-01T02:23:00Z) - Switchable Representation Learning Framework with Self-compatibility [50.48336074436792]
We propose a Switchable representation learning Framework with Self-Compatibility (SFSC)
SFSC generates a series of compatible sub-models with different capacities through one training process.
SFSC achieves state-of-the-art performance on the evaluated datasets.
arXiv Detail & Related papers (2022-06-16T16:46:32Z) - Online Model Compression for Federated Learning with Large Models [8.48327410170884]
Online Model Compression (OMC) is a framework that stores model parameters in a compressed format and decompresses them only when needed.
OMC can reduce memory usage and communication cost of model parameters by up to 59% while attaining comparable accuracy and training speed when compared with full-precision training.
arXiv Detail & Related papers (2022-05-06T22:43:03Z) - Model LEGO: Creating Models Like Disassembling and Assembling Building Blocks [53.09649785009528]
In this paper, we explore a paradigm that does not require training to obtain new models.
Similar to the birth of CNN inspired by receptive fields in the biological visual system, we propose Model Disassembling and Assembling.
For model assembling, we present the alignment padding strategy and parameter scaling strategy to construct a new model tailored for a specific task.
arXiv Detail & Related papers (2022-03-25T05:27:28Z) - Tiny Neural Models for Seq2Seq [0.0]
We propose a projection based encoder-decoder model referred to as pQRNN-MAtt.
The resulting quantized models are less than 3.5MB in size and are well suited for on-device latency critical applications.
We show that on MTOP, a challenging multilingual semantic parsing dataset, the average model performance surpasses LSTM based seq2seq model that uses pre-trained embeddings despite being 85x smaller.
arXiv Detail & Related papers (2021-08-07T00:39:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.