Efficient Adapter Tuning of Pre-trained Speech Models for Automatic
Speaker Verification
- URL: http://arxiv.org/abs/2403.00293v1
- Date: Fri, 1 Mar 2024 05:32:14 GMT
- Title: Efficient Adapter Tuning of Pre-trained Speech Models for Automatic
Speaker Verification
- Authors: Mufan Sang, John H.L. Hansen
- Abstract summary: Self-supervised speech models have shown impressive performance on various downstream speech tasks.
fine-tuning becomes practically unfeasible due to heavy computation and storage overhead.
We propose an effective adapter framework designed for adapting self-supervised speech models to the speaker verification task.
- Score: 38.20393847192532
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With excellent generalization ability, self-supervised speech models have
shown impressive performance on various downstream speech tasks in the
pre-training and fine-tuning paradigm. However, as the growing size of
pre-trained models, fine-tuning becomes practically unfeasible due to heavy
computation and storage overhead, as well as the risk of overfitting. Adapters
are lightweight modules inserted into pre-trained models to facilitate
parameter-efficient adaptation. In this paper, we propose an effective adapter
framework designed for adapting self-supervised speech models to the speaker
verification task. With a parallel adapter design, our proposed framework
inserts two types of adapters into the pre-trained model, allowing the
adaptation of latent features within intermediate Transformer layers and output
embeddings from all Transformer layers. We conduct comprehensive experiments to
validate the efficiency and effectiveness of the proposed framework.
Experimental results on the VoxCeleb1 dataset demonstrate that the proposed
adapters surpass fine-tuning and other parameter-efficient transfer learning
methods, achieving superior performance while updating only 5% of the
parameters.
Related papers
- Learning Fine-Grained Controllability on Speech Generation via Efficient Fine-Tuning [43.43337861152684]
Voicebox Adapter is a novel approach that integrates fine-grained conditions into a pre-trained Voicebox speech generation model.
Our experiment shows that the LoRA with bias-tuning configuration yields the best performance.
arXiv Detail & Related papers (2024-06-10T13:31:18Z) - Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation [67.13876021157887]
Dynamic Tuning (DyT) is a novel approach to improve both parameter and inference efficiency for ViT adaptation.
DyT achieves superior performance compared to existing PEFT methods while evoking only 71% of their FLOPs on the VTAB-1K benchmark.
arXiv Detail & Related papers (2024-03-18T14:05:52Z) - Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis [51.14136878142034]
Point cloud analysis has achieved outstanding performance by transferring point cloud pre-trained models.
Existing methods for model adaptation usually update all model parameters, which is inefficient as it relies on high computational costs.
In this paper, we aim to study parameter-efficient transfer learning for point cloud analysis with an ideal trade-off between task performance and parameter efficiency.
arXiv Detail & Related papers (2024-03-03T08:25:04Z) - Efficient Adaptation of Large Vision Transformer via Adapter
Re-Composing [8.88477151877883]
High-capacity pre-trained models have revolutionized problem-solving in computer vision.
We propose a novel Adapter Re-Composing (ARC) strategy that addresses efficient pre-trained model adaptation.
Our approach considers the reusability of adaptation parameters and introduces a parameter-sharing scheme.
arXiv Detail & Related papers (2023-10-10T01:04:15Z) - Evaluating Parameter-Efficient Transfer Learning Approaches on SURE
Benchmark for Speech Understanding [40.27182770995891]
Fine-tuning is widely used as the default algorithm for transfer learning from pre-trained models.
We introduce the Speech UndeRstanding Evaluation (SURE) benchmark for parameter-efficient learning for various speech-processing tasks.
arXiv Detail & Related papers (2023-03-02T08:57:33Z) - CHAPTER: Exploiting Convolutional Neural Network Adapters for
Self-supervised Speech Models [62.60723685118747]
Self-supervised learning (SSL) is a powerful technique for learning representations from unlabeled data.
We propose an efficient tuning method specifically designed for SSL speech model, by applying CNN adapters at the feature extractor.
We empirically found that adding CNN to the feature extractor can help the adaptation on emotion and speaker tasks.
arXiv Detail & Related papers (2022-12-01T08:50:12Z) - AdaMix: Mixture-of-Adapter for Parameter-efficient Tuning of Large
Language Models [119.7093605087114]
Fine-tuning large-scale pre-trained language models to downstream tasks require updating hundreds of millions of parameters.
This not only increases the serving cost to store a large copy of the model weights for every task, but also exhibits instability during few-shot task adaptation.
We introduce a new mechanism to improve adapter capacity without increasing parameters or computational cost by two key techniques.
arXiv Detail & Related papers (2022-05-24T23:41:22Z) - AdapterBias: Parameter-efficient Token-dependent Representation Shift
for Adapters in NLP Tasks [55.705355299065474]
Transformer-based pre-trained models with millions of parameters require large storage.
Recent approaches tackle this shortcoming by training adapters, but these approaches still require a relatively large number of parameters.
In this study, AdapterBias, a surprisingly simple yet effective adapter architecture, is proposed.
arXiv Detail & Related papers (2022-04-30T16:49:41Z) - Parameter-efficient Model Adaptation for Vision Transformers [45.3460867776953]
We study parameter-efficient model adaptation strategies for vision transformers on the image classification task.
We propose a parameter-efficient model adaptation framework, which first selects submodules by measuring local intrinsic dimensions.
Our method performs the best in terms of the tradeoff between accuracy and parameter efficiency across 20 image classification datasets.
arXiv Detail & Related papers (2022-03-29T05:30:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.