Convexity-based Pruning of Speech Representation Models
- URL: http://arxiv.org/abs/2408.11858v1
- Date: Fri, 16 Aug 2024 09:04:54 GMT
- Title: Convexity-based Pruning of Speech Representation Models
- Authors: Teresa Dorszewski, Lenka Tětková, Lars Kai Hansen,
- Abstract summary: Recent work has shown that there is significant redundancy in the transformer models for NLP.
In this paper, we investigate layer pruning in audio models.
We find a massive reduction in the computational effort with no loss of performance or even improvements in certain cases.
- Score: 1.3873323883842132
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Speech representation models based on the transformer architecture and trained by self-supervised learning have shown great promise for solving tasks such as speech and speaker recognition, keyword spotting, emotion detection, and more. Typically, it is found that larger models lead to better performance. However, the significant computational effort involved in such large transformer systems is a challenge for embedded and real-world applications. Recent work has shown that there is significant redundancy in the transformer models for NLP and massive layer pruning is feasible (Sajjad et al., 2023). Here, we investigate layer pruning in audio models. We base the pruning decision on a convexity criterion. Convexity of classification regions has recently been proposed as an indicator of subsequent fine-tuning performance in a range of application domains, including NLP and audio. In empirical investigations, we find a massive reduction in the computational effort with no loss of performance or even improvements in certain cases.
Related papers
- How Redundant Is the Transformer Stack in Speech Representation Models? [1.3873323883842132]
Self-supervised speech representation models have demonstrated remarkable performance across various tasks such as speech recognition, speaker identification, and emotion detection.
Recent studies on transformer models revealed a high redundancy between layers and the potential for significant pruning.
We demonstrate the effectiveness of pruning transformer-based speech representation models without the need for post-training.
arXiv Detail & Related papers (2024-09-10T11:00:24Z) - Pivotal Auto-Encoder via Self-Normalizing ReLU [20.76999663290342]
We formalize single hidden layer sparse auto-encoders as a transform learning problem.
We propose an optimization problem that leads to a predictive model invariant to the noise level at test time.
Our experimental results demonstrate that the trained models yield a significant improvement in stability against varying types of noise.
arXiv Detail & Related papers (2024-06-23T09:06:52Z) - X-Pruner: eXplainable Pruning for Vision Transformers [12.296223124178102]
Vision transformer models usually suffer from intensive computational costs and heavy memory requirements.
Recent studies have proposed to prune transformers in an unexplainable manner, which overlook the relationship between internal units of the model and the target class.
We propose a novel explainable pruning framework dubbed X-Pruner, which is designed by considering the explainability of the pruning criterion.
arXiv Detail & Related papers (2023-03-08T23:10:18Z) - Robust representations of oil wells' intervals via sparse attention
mechanism [2.604557228169423]
We introduce the class of efficient Transformers named Regularized Transformers (Reguformers)
The focus in our experiments is on oil&gas data, namely, well logs.
To evaluate our models for such problems, we work with an industry-scale open dataset consisting of well logs of more than 20 wells.
arXiv Detail & Related papers (2022-12-29T09:56:33Z) - Confident Adaptive Language Modeling [95.45272377648773]
CALM is a framework for dynamically allocating different amounts of compute per input and generation timestep.
We demonstrate the efficacy of our framework in reducing compute -- potential speedup of up to $times 3$ -- while provably maintaining high performance.
arXiv Detail & Related papers (2022-07-14T17:00:19Z) - PLATON: Pruning Large Transformer Models with Upper Confidence Bound of
Weight Importance [114.1541203743303]
We propose PLATON, which captures the uncertainty of importance scores by upper confidence bound (UCB) of importance estimation.
We conduct extensive experiments with several Transformer-based models on natural language understanding, question answering and image classification.
arXiv Detail & Related papers (2022-06-25T05:38:39Z) - MoEfication: Conditional Computation of Transformer Models for Efficient
Inference [66.56994436947441]
Transformer-based pre-trained language models can achieve superior performance on most NLP tasks due to large parameter capacity, but also lead to huge computation cost.
We explore to accelerate large-model inference by conditional computation based on the sparse activation phenomenon.
We propose to transform a large model into its mixture-of-experts (MoE) version with equal model size, namely MoEfication.
arXiv Detail & Related papers (2021-10-05T02:14:38Z) - Bridging the Gap Between Clean Data Training and Real-World Inference
for Spoken Language Understanding [76.89426311082927]
Existing models are trained on clean data, which causes a textitgap between clean data training and real-world inference.
We propose a method from the perspective of domain adaptation, by which both high- and low-quality samples are embedding into similar vector space.
Experiments on the widely-used dataset, Snips, and large scale in-house dataset (10 million training examples) demonstrate that this method not only outperforms the baseline models on real-world (noisy) corpus but also enhances the robustness, that is, it produces high-quality results under a noisy environment.
arXiv Detail & Related papers (2021-04-13T17:54:33Z) - Boosting Objective Scores of a Speech Enhancement Model by MetricGAN
Post-processing [18.19158404358494]
The Transformer architecture has demonstrated a superior ability compared to recurrent neural networks in many different natural language processing applications.
Our study applies a modified Transformer in a speech enhancement task.
arXiv Detail & Related papers (2020-06-18T06:22:09Z) - Simplified Self-Attention for Transformer-based End-to-End Speech
Recognition [56.818507476125895]
We propose a simplified self-attention (SSAN) layer which employs FSMN memory block instead of projection layers to form query and key vectors.
We evaluate the SSAN-based and the conventional SAN-based transformers on the public AISHELL-1, internal 1000-hour and 20,000-hour large-scale Mandarin tasks.
arXiv Detail & Related papers (2020-05-21T04:55:59Z) - Deep Speaker Embeddings for Far-Field Speaker Recognition on Short
Utterances [53.063441357826484]
Speaker recognition systems based on deep speaker embeddings have achieved significant performance in controlled conditions.
Speaker verification on short utterances in uncontrolled noisy environment conditions is one of the most challenging and highly demanded tasks.
This paper presents approaches aimed to achieve two goals: a) improve the quality of far-field speaker verification systems in the presence of environmental noise, reverberation and b) reduce the system qualitydegradation for short utterances.
arXiv Detail & Related papers (2020-02-14T13:34:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.