Multi-View Attention Transfer for Efficient Speech Enhancement
- URL: http://arxiv.org/abs/2208.10367v1
- Date: Mon, 22 Aug 2022 14:47:47 GMT
- Title: Multi-View Attention Transfer for Efficient Speech Enhancement
- Authors: Wooseok Shin, Hyun Joon Park, Jin Sob Kim, Byung Hoon Lee, Sung Won
Han
- Abstract summary: We propose multi-view attention transfer (MV-AT), a feature-based distillation, to obtain efficient speech enhancement models in the time domain.
Based on the multi-view features extraction model, MV-AT transfers multi-view knowledge of the teacher network to the student network without additional parameters.
- Score: 1.6932706284468382
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent deep learning models have achieved high performance in speech
enhancement; however, it is still challenging to obtain a fast and
low-complexity model without significant performance degradation. Previous
knowledge distillation studies on speech enhancement could not solve this
problem because their output distillation methods do not fit the speech
enhancement task in some aspects. In this study, we propose multi-view
attention transfer (MV-AT), a feature-based distillation, to obtain efficient
speech enhancement models in the time domain. Based on the multi-view features
extraction model, MV-AT transfers multi-view knowledge of the teacher network
to the student network without additional parameters. The experimental results
show that the proposed method consistently improved the performance of student
models of various sizes on the Valentini and deep noise suppression (DNS)
datasets. MANNER-S-8.1GF with our proposed method, a lightweight model for
efficient deployment, achieved 15.4x and 4.71x fewer parameters and
floating-point operations (FLOPs), respectively, compared to the baseline model
with similar performance.
Related papers
- Scalable Model Merging with Progressive Layer-wise Distillation [17.521794641817642]
We introduce a novel few-shot merging algorithm, ProDistill (Progressive Layer-wise Distillation)
We show that ProDistill achieves state-of-the-art performance, with up to 6.14% and 6.61% improvements in vision and NLU tasks.
arXiv Detail & Related papers (2025-02-18T10:15:18Z) - Feature Alignment-Based Knowledge Distillation for Efficient Compression of Large Language Models [4.737806982257592]
This study proposes a knowledge distillation algorithm based on large language models and feature alignment.
The proposed model performs very close to the state-of-the-art GPT-4 model in terms of evaluation indicators such as perplexity, BLEU, ROUGE, and CER.
arXiv Detail & Related papers (2024-12-27T04:37:06Z) - Numerical Pruning for Efficient Autoregressive Models [87.56342118369123]
This paper focuses on compressing decoder-only transformer-based autoregressive models through structural weight pruning.
Specifically, we propose a training-free pruning method that calculates a numerical score with Newton's method for the Attention and modules, respectively.
To verify the effectiveness of our method, we provide both theoretical support and extensive experiments.
arXiv Detail & Related papers (2024-12-17T01:09:23Z) - EchoAtt: Attend, Copy, then Adjust for More Efficient Large Language Models [29.57891007810509]
Large Language Models (LLMs) have demonstrated outstanding performance across a variety of natural language processing tasks.
We introduce EchoAtt, a novel framework aimed at optimizing transformer-based models by analyzing and leveraging the similarity of attention patterns across layers.
Our best results with TinyLLaMA-1.1B demonstrate that EchoAtt improves inference speed by 15%, training speed by 25%, and reduces the number of parameters by approximately 4%, all while improving zero-shot performance.
arXiv Detail & Related papers (2024-09-22T21:08:37Z) - DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception [66.88792390480343]
We propose DEEM, a simple but effective approach that utilizes the generative feedback of diffusion models to align the semantic distributions of the image encoder.
DEEM exhibits enhanced robustness and a superior capacity to alleviate model hallucinations while utilizing fewer trainable parameters, less pre-training data, and a smaller base model size.
arXiv Detail & Related papers (2024-05-24T05:46:04Z) - MoE-LLaVA: Mixture of Experts for Large Vision-Language Models [49.32669226551026]
We propose a simple yet effective training strategy MoE-Tuning for LVLMs.
MoE-LLaVA, a MoE-based sparse LVLM architecture, uniquely activates only the top-k experts through routers.
Experiments show the significant performance of MoE-LLaVA in a variety of visual understanding and object hallucination benchmarks.
arXiv Detail & Related papers (2024-01-29T08:13:40Z) - E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning [55.50908600818483]
Fine-tuning large-scale pretrained vision models for new tasks has become increasingly parameter-intensive.
We propose an Effective and Efficient Visual Prompt Tuning (E2VPT) approach for large-scale transformer-based model adaptation.
Our approach outperforms several state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2023-07-25T19:03:21Z) - Ensemble knowledge distillation of self-supervised speech models [84.69577440755457]
Distilled self-supervised models have shown competitive performance and efficiency in recent years.
We performed Ensemble Knowledge Distillation (EKD) on various self-supervised speech models such as HuBERT, RobustHuBERT, and WavLM.
Our method improves the performance of the distilled models on four downstream speech processing tasks.
arXiv Detail & Related papers (2023-02-24T17:15:39Z) - Adapted Multimodal BERT with Layer-wise Fusion for Sentiment Analysis [84.12658971655253]
We propose Adapted Multimodal BERT, a BERT-based architecture for multimodal tasks.
adapter adjusts the pretrained language model for the task at hand, while the fusion layers perform task-specific, layer-wise fusion of audio-visual information with textual BERT representations.
In our ablations we see that this approach leads to efficient models, that can outperform their fine-tuned counterparts and are robust to input noise.
arXiv Detail & Related papers (2022-12-01T17:31:42Z) - Online Knowledge Distillation via Multi-branch Diversity Enhancement [15.523646047674717]
We propose a new distillation method to enhance the diversity among multiple student models.
We use Feature Fusion Module (FFM), which improves the performance of the attention mechanism in the network.
We also use Diversification(CD) loss function to strengthen the differences between the student models.
arXiv Detail & Related papers (2020-10-02T05:52:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.