Neural Language Model Pruning for Automatic Speech Recognition
- URL: http://arxiv.org/abs/2310.03424v1
- Date: Thu, 5 Oct 2023 10:01:32 GMT
- Title: Neural Language Model Pruning for Automatic Speech Recognition
- Authors: Leonardo Emili, Thiago Fraga-Silva, Ernest Pusateri, Markus
Nu{\ss}baum-Thom, Youssef Oualil
- Abstract summary: We study model pruning methods applied to Transformer-based neural network language models for automatic speech recognition.
We explore three aspects of the pruning frame work, namely criterion, method and scheduler, analyzing their contribution in terms of accuracy and inference speed.
- Score: 4.10609794373612
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study model pruning methods applied to Transformer-based neural network
language models for automatic speech recognition. We explore three aspects of
the pruning frame work, namely criterion, method and scheduler, analyzing their
contribution in terms of accuracy and inference speed. To the best of our
knowledge, such in-depth analyses on large-scale recognition systems has not
been reported in the literature. In addition, we propose a variant of low-rank
approximation suitable for incrementally compressing models, and delivering
multiple models with varied target sizes. Among other results, we show that a)
data-driven pruning outperforms magnitude-driven in several scenarios; b)
incremental pruning achieves higher accuracy compared to one-shot pruning,
especially when targeting smaller sizes; and c) low-rank approximation presents
the best trade-off between size reduction and inference speed-up for moderate
compression.
Related papers
- Predictor-Corrector Enhanced Transformers with Exponential Moving Average Coefficient Learning [73.73967342609603]
We introduce a predictor-corrector learning framework to minimize truncation errors.
We also propose an exponential moving average-based coefficient learning method to strengthen our higher-order predictor.
Our model surpasses a robust 3.8B DeepNet by an average of 2.9 SacreBLEU, using only 1/3 parameters.
arXiv Detail & Related papers (2024-11-05T12:26:25Z) - Optimization of DNN-based speaker verification model through efficient quantization technique [15.250677730668466]
Quantization of deep models offers a means to reduce both computational and memory expenses.
Our research proposes an optimization framework for the quantization of the speaker verification model.
arXiv Detail & Related papers (2024-07-12T05:03:10Z) - Don't Be So Sure! Boosting ASR Decoding via Confidence Relaxation [7.056222499095849]
beam search seeks the transcript with the greatest likelihood computed using the predicted distribution.
We show that recently proposed Self-Supervised Learning (SSL)-based ASR models tend to yield exceptionally confident predictions.
We propose a decoding procedure that improves the performance of fine-tuned ASR models.
arXiv Detail & Related papers (2022-12-27T06:42:26Z) - Automatic Mixed-Precision Quantization Search of BERT [62.65905462141319]
Pre-trained language models such as BERT have shown remarkable effectiveness in various natural language processing tasks.
These models usually contain millions of parameters, which prevents them from practical deployment on resource-constrained devices.
We propose an automatic mixed-precision quantization framework designed for BERT that can simultaneously conduct quantization and pruning in a subgroup-wise level.
arXiv Detail & Related papers (2021-12-30T06:32:47Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - Distributionally Robust Recurrent Decoders with Random Network
Distillation [93.10261573696788]
We propose a method based on OOD detection with Random Network Distillation to allow an autoregressive language model to disregard OOD context during inference.
We apply our method to a GRU architecture, demonstrating improvements on multiple language modeling (LM) datasets.
arXiv Detail & Related papers (2021-10-25T19:26:29Z) - An Efficient Deep Learning Model for Automatic Modulation Recognition
Based on Parameter Estimation and Transformation [3.3941243094128035]
This letter proposes an efficient DL-AMR model based on phase parameter estimation and transformation.
Our model is more competitive in training time and test time than the benchmark models with similar recognition accuracy.
arXiv Detail & Related papers (2021-10-11T03:28:28Z) - Firearm Detection via Convolutional Neural Networks: Comparing a
Semantic Segmentation Model Against End-to-End Solutions [68.8204255655161]
Threat detection of weapons and aggressive behavior from live video can be used for rapid detection and prevention of potentially deadly incidents.
One way for achieving this is through the use of artificial intelligence and, in particular, machine learning for image analysis.
We compare a traditional monolithic end-to-end deep learning model and a previously proposed model based on an ensemble of simpler neural networks detecting fire-weapons via semantic segmentation.
arXiv Detail & Related papers (2020-12-17T15:19:29Z) - Discriminative Nearest Neighbor Few-Shot Intent Detection by
Transferring Natural Language Inference [150.07326223077405]
Few-shot learning is attracting much attention to mitigate data scarcity.
We present a discriminative nearest neighbor classification with deep self-attention.
We propose to boost the discriminative ability by transferring a natural language inference (NLI) model.
arXiv Detail & Related papers (2020-10-25T00:39:32Z) - Attention based on-device streaming speech recognition with large speech
corpus [16.702653972113023]
We present a new on-device automatic speech recognition (ASR) system based on monotonic chunk-wise attention (MoChA) models trained with large (> 10K hours) corpus.
We attained around 90% of a word recognition rate for general domain mainly by using joint training of connectionist temporal classifier (CTC) and cross entropy (CE) losses.
For on-demand adaptation, we fused the MoChA models with statistical n-gram models, and we could achieve a relatively 36% improvement on average in word error rate (WER) for target domains including the general domain.
arXiv Detail & Related papers (2020-01-02T04:24:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.