Related papers: Low-rank Adaptation Method for Wav2vec2-based Fake Audio Detection

Low-rank Adaptation Method for Wav2vec2-based Fake Audio Detection

URL: http://arxiv.org/abs/2306.05617v1
Date: Fri, 9 Jun 2023 01:43:41 GMT
Title: Low-rank Adaptation Method for Wav2vec2-based Fake Audio Detection
Authors: Chenglong Wang, Jiangyan Yi, Xiaohui Zhang, Jianhua Tao, Le Xu and Ruibo Fu
Abstract summary: Self-supervised speech models are a rapidly developing research topic in fake audio detection. We apply low-rank adaptation(LoRA) to the wav2vec2 model, freezing the pre-trained model weights and injecting a trainable rank-decomposition matrix into each layer of the transformer architecture. Compared with fine-tuning with Adam on the wav2vec2 model containing 317M training parameters, LoRA achieved similar performance by reducing the number of trainable parameters by 198 times.
Score: 57.537583869961885
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Self-supervised speech models are a rapidly developing research topic in fake audio detection. Many pre-trained models can serve as feature extractors, learning richer and higher-level speech features. However,when fine-tuning pre-trained models, there is often a challenge of excessively long training times and high memory consumption, and complete fine-tuning is also very expensive. To alleviate this problem, we apply low-rank adaptation(LoRA) to the wav2vec2 model, freezing the pre-trained model weights and injecting a trainable rank-decomposition matrix into each layer of the transformer architecture, greatly reducing the number of trainable parameters for downstream tasks. Compared with fine-tuning with Adam on the wav2vec2 model containing 317M training parameters, LoRA achieved similar performance by reducing the number of trainable parameters by 198 times.

Related papers

LoRTA: Low Rank Tensor Adaptation of Large Language Models [70.32218116940393]
Low Rank Adaptation (LoRA) is a popular Efficient Fine Tuning (PEFT) method that effectively adapts large pre-trained models for downstream tasks. We propose a novel approach that employs a low rank tensor parametrization for model updates. Our method is both efficient and effective for fine-tuning large language models, achieving a substantial reduction in the number of parameters while maintaining comparable performance.
arXiv Detail & Related papers (2024-10-05T06:59:50Z)
Low-rank Attention Side-Tuning for Parameter-Efficient Fine-Tuning [19.17362588650503]
Low-rank Attention Side-Tuning (LAST) trains a side-network composed of only low-rank self-attention modules. We show LAST can be highly parallel across multiple optimization objectives, making it very efficient in downstream task adaptation.
arXiv Detail & Related papers (2024-02-06T14:03:15Z)
Dynamic Layer Tying for Parameter-Efficient Transformers [65.268245109828]
We employ Reinforcement Learning to select layers during training and tie them together. This facilitates weight sharing, reduces the number of trainable parameters, and also serves as an effective regularization technique. In particular, the memory consumption during training is up to one order of magnitude less than the conventional training method.
arXiv Detail & Related papers (2024-01-23T14:53:20Z)
Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual Downstream Tasks [55.36987468073152]
This paper proposes a novel Dual-Guided Spatial-Channel-Temporal (DG-SCT) attention mechanism. The DG-SCT module incorporates trainable cross-modal interaction layers into pre-trained audio-visual encoders. Our proposed model achieves state-of-the-art results across multiple downstream tasks, including AVE, AVVP, AVS, and AVQA.
arXiv Detail & Related papers (2023-11-09T05:24:20Z)
PELA: Learning Parameter-Efficient Models with Low-Rank Approximation [16.9278983497498]
We propose a novel method for increasing the parameter efficiency of pre-trained models by introducing an intermediate pre-training stage. This allows for direct and efficient utilization of the low-rank model for downstream fine-tuning tasks.
arXiv Detail & Related papers (2023-10-16T07:17:33Z)
ReLoRA: High-Rank Training Through Low-Rank Updates [14.606961537327345]
We introduce a novel method called ReLoRA, which utilizes low-rank updates to train high-rank networks. ReLoRA saves up to 5.5Gb of RAM per GPU and improves training speed by 9-40% depending on the model size and hardware setup.
arXiv Detail & Related papers (2023-07-11T18:02:09Z)
On-demand compute reduction with stochastic wav2vec 2.0 [63.22845151306881]
We propose compression for on-demand compute reduction for wav2vec 2.0 (W2V2) models. Our results for models pre-trained on 960h Librispeech dataset and fine-tuned on 10h of transcribed data show that using the same model, we get a smooth trade-off between word error rate (WER) and inference time.
arXiv Detail & Related papers (2022-04-25T19:25:46Z)
Efficient Adapter Transfer of Self-Supervised Speech Models for Automatic Speech Recognition [0.1909808926064466]
Transformer based models such as wav2vec 2.0 and HuBERT are leading the field in the speech domain. We propose applying adapters to wav2vec 2.0 to reduce the number of parameters required for downstream ASR tasks.
arXiv Detail & Related papers (2022-02-07T14:20:54Z)
LoRA: Low-Rank Adaptation of Large Language Models [71.75808607987281]
Low-Rank Adaptation, or LoRA, freezes the pre-trained model weights and injects trainable rank decomposition into each layer of the Transformer architecture. For GPT-3, LoRA can reduce the number of trainable parameters by 10,000 times and the computation hardware requirement by 3 times compared to full fine-tuning.
arXiv Detail & Related papers (2021-06-17T17:37:18Z)
Dynamic Model Pruning with Feedback [64.019079257231]
We propose a novel model compression method that generates a sparse trained model without additional overhead. We evaluate our method on CIFAR-10 and ImageNet, and show that the obtained sparse models can reach the state-of-the-art performance of dense models.
arXiv Detail & Related papers (2020-06-12T15:07:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.