Related papers: Neural Language Model Pruning for Automatic Speech Recognition

Neural Language Model Pruning for Automatic Speech Recognition

URL: http://arxiv.org/abs/2310.03424v1
Date: Thu, 5 Oct 2023 10:01:32 GMT
Title: Neural Language Model Pruning for Automatic Speech Recognition
Authors: Leonardo Emili, Thiago Fraga-Silva, Ernest Pusateri, Markus Nu{\ss}baum-Thom, Youssef Oualil
Abstract summary: We study model pruning methods applied to Transformer-based neural network language models for automatic speech recognition. We explore three aspects of the pruning frame work, namely criterion, method and scheduler, analyzing their contribution in terms of accuracy and inference speed.
Score: 4.10609794373612
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study model pruning methods applied to Transformer-based neural network language models for automatic speech recognition. We explore three aspects of the pruning frame work, namely criterion, method and scheduler, analyzing their contribution in terms of accuracy and inference speed. To the best of our knowledge, such in-depth analyses on large-scale recognition systems has not been reported in the literature. In addition, we propose a variant of low-rank approximation suitable for incrementally compressing models, and delivering multiple models with varied target sizes. Among other results, we show that a) data-driven pruning outperforms magnitude-driven in several scenarios; b) incremental pruning achieves higher accuracy compared to one-shot pruning, especially when targeting smaller sizes; and c) low-rank approximation presents the best trade-off between size reduction and inference speed-up for moderate compression.

Related papers

Instruction-Following Pruning for Large Language Models [58.329978053711024]
We move beyond the traditional static pruning approach of determining a fixed pruning mask for a model. In our method, the pruning mask is input-dependent and adapts dynamically based on the information described in a user instruction. Our approach, termed "instruction-following pruning", introduces a sparse mask predictor that takes the user instruction as input and dynamically selects the most relevant model parameters for the given task.
arXiv Detail & Related papers (2025-01-03T20:19:14Z)
Predictor-Corrector Enhanced Transformers with Exponential Moving Average Coefficient Learning [73.73967342609603]
We introduce a predictor-corrector learning framework to minimize truncation errors. We also propose an exponential moving average-based coefficient learning method to strengthen our higher-order predictor. Our model surpasses a robust 3.8B DeepNet by an average of 2.9 SacreBLEU, using only 1/3 parameters.
arXiv Detail & Related papers (2024-11-05T12:26:25Z)
Optimization of DNN-based speaker verification model through efficient quantization technique [15.250677730668466]
Quantization of deep models offers a means to reduce both computational and memory expenses. Our research proposes an optimization framework for the quantization of the speaker verification model.
arXiv Detail & Related papers (2024-07-12T05:03:10Z)
Don't Be So Sure! Boosting ASR Decoding via Confidence Relaxation [7.056222499095849]
beam search seeks the transcript with the greatest likelihood computed using the predicted distribution. We show that recently proposed Self-Supervised Learning (SSL)-based ASR models tend to yield exceptionally confident predictions. We propose a decoding procedure that improves the performance of fine-tuned ASR models.
arXiv Detail & Related papers (2022-12-27T06:42:26Z)
Automatic Mixed-Precision Quantization Search of BERT [62.65905462141319]
Pre-trained language models such as BERT have shown remarkable effectiveness in various natural language processing tasks. These models usually contain millions of parameters, which prevents them from practical deployment on resource-constrained devices. We propose an automatic mixed-precision quantization framework designed for BERT that can simultaneously conduct quantization and pruning in a subgroup-wise level.
arXiv Detail & Related papers (2021-12-30T06:32:47Z)
Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states. Our method is widely applicable to classical DP-based inference. It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z)
Distributionally Robust Recurrent Decoders with Random Network Distillation [93.10261573696788]
We propose a method based on OOD detection with Random Network Distillation to allow an autoregressive language model to disregard OOD context during inference. We apply our method to a GRU architecture, demonstrating improvements on multiple language modeling (LM) datasets.
arXiv Detail & Related papers (2021-10-25T19:26:29Z)
An Efficient Deep Learning Model for Automatic Modulation Recognition Based on Parameter Estimation and Transformation [3.3941243094128035]
This letter proposes an efficient DL-AMR model based on phase parameter estimation and transformation. Our model is more competitive in training time and test time than the benchmark models with similar recognition accuracy.
arXiv Detail & Related papers (2021-10-11T03:28:28Z)
Firearm Detection via Convolutional Neural Networks: Comparing a Semantic Segmentation Model Against End-to-End Solutions [68.8204255655161]
Threat detection of weapons and aggressive behavior from live video can be used for rapid detection and prevention of potentially deadly incidents. One way for achieving this is through the use of artificial intelligence and, in particular, machine learning for image analysis. We compare a traditional monolithic end-to-end deep learning model and a previously proposed model based on an ensemble of simpler neural networks detecting fire-weapons via semantic segmentation.
arXiv Detail & Related papers (2020-12-17T15:19:29Z)
Discriminative Nearest Neighbor Few-Shot Intent Detection by Transferring Natural Language Inference [150.07326223077405]
Few-shot learning is attracting much attention to mitigate data scarcity. We present a discriminative nearest neighbor classification with deep self-attention. We propose to boost the discriminative ability by transferring a natural language inference (NLI) model.
arXiv Detail & Related papers (2020-10-25T00:39:32Z)
Attention based on-device streaming speech recognition with large speech corpus [16.702653972113023]
We present a new on-device automatic speech recognition (ASR) system based on monotonic chunk-wise attention (MoChA) models trained with large (> 10K hours) corpus. We attained around 90% of a word recognition rate for general domain mainly by using joint training of connectionist temporal classifier (CTC) and cross entropy (CE) losses. For on-demand adaptation, we fused the MoChA models with statistical n-gram models, and we could achieve a relatively 36% improvement on average in word error rate (WER) for target domains including the general domain.
arXiv Detail & Related papers (2020-01-02T04:24:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.