CHAPTER: Exploiting Convolutional Neural Network Adapters for
Self-supervised Speech Models
- URL: http://arxiv.org/abs/2212.01282v1
- Date: Thu, 1 Dec 2022 08:50:12 GMT
- Title: CHAPTER: Exploiting Convolutional Neural Network Adapters for
Self-supervised Speech Models
- Authors: Zih-Ching Chen, Yu-Shun Sung, Hung-yi Lee
- Abstract summary: Self-supervised learning (SSL) is a powerful technique for learning representations from unlabeled data.
We propose an efficient tuning method specifically designed for SSL speech model, by applying CNN adapters at the feature extractor.
We empirically found that adding CNN to the feature extractor can help the adaptation on emotion and speaker tasks.
- Score: 62.60723685118747
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervised learning (SSL) is a powerful technique for learning
representations from unlabeled data. Transformer based models such as HuBERT,
which consist a feature extractor and transformer layers, are leading the field
in the speech domain. SSL models are fine-tuned on a wide range of downstream
tasks, which involves re-training the majority of the model for each task.
Previous studies have introduced applying adapters, which are small lightweight
modules commonly used in Natural Language Processing (NLP) to adapt pre-trained
models to new tasks. However, such efficient tuning techniques only provide
adaptation at the transformer layer, but failed to perform adaptation at the
feature extractor. In this paper, we propose CHAPTER, an efficient tuning
method specifically designed for SSL speech model, by applying CNN adapters at
the feature extractor. Using this method, we can only fine-tune fewer than 5%
of parameters per task compared to fully fine-tuning and achieve better and
more stable performance. We empirically found that adding CNN adapters to the
feature extractor can help the adaptation on emotion and speaker tasks. For
instance, the accuracy of SID is improved from 87.71 to 91.56, and the accuracy
of ER is improved by 5%.
Related papers
- ELP-Adapters: Parameter Efficient Adapter Tuning for Various Speech Processing Tasks [10.852047082856487]
We introduce ELP-adapter tuning, a novel method for parameter-efficient fine-tuning using three types of adapters.
E-adapters are integrated into transformer-based encoder layers and help to learn fine-grained speech representations that are effective for speech recognition.
L-adapters create paths from each encoder layer to the downstream head and help to extract non-linguistic features from lower encoder layers.
The P-adapter appends pseudo features to CNN features to further improve effectiveness and efficiency.
arXiv Detail & Related papers (2024-07-28T05:26:03Z) - eP-ALM: Efficient Perceptual Augmentation of Language Models [70.47962271121389]
We propose to direct effort to efficient adaptations of existing models, and propose to augment Language Models with perception.
Existing approaches for adapting pretrained models for vision-language tasks still rely on several key components that hinder their efficiency.
We show that by freezing more than 99% of total parameters, training only one linear projection layer, and prepending only one trainable token, our approach (dubbed eP-ALM) significantly outperforms other baselines on VQA and Captioning.
arXiv Detail & Related papers (2023-03-20T19:20:34Z) - Exploring Efficient-tuning Methods in Self-supervised Speech Models [53.633222197712875]
Self-supervised learning can learn powerful representations for different speech tasks.
In downstream tasks, the parameters of SSL models are frozen, and only the adapters are trained.
We show that the performance parity can be achieved with over 90% parameter reduction.
arXiv Detail & Related papers (2022-10-10T11:08:12Z) - Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than
In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made.
parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task.
In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z) - AdapterBias: Parameter-efficient Token-dependent Representation Shift
for Adapters in NLP Tasks [55.705355299065474]
Transformer-based pre-trained models with millions of parameters require large storage.
Recent approaches tackle this shortcoming by training adapters, but these approaches still require a relatively large number of parameters.
In this study, AdapterBias, a surprisingly simple yet effective adapter architecture, is proposed.
arXiv Detail & Related papers (2022-04-30T16:49:41Z) - Efficient Adapter Transfer of Self-Supervised Speech Models for
Automatic Speech Recognition [0.1909808926064466]
Transformer based models such as wav2vec 2.0 and HuBERT are leading the field in the speech domain.
We propose applying adapters to wav2vec 2.0 to reduce the number of parameters required for downstream ASR tasks.
arXiv Detail & Related papers (2022-02-07T14:20:54Z) - Legal Transformer Models May Not Always Help [3.6061626009104057]
This work investigates the value of domain adaptive pre-training and language adapters in legal NLP tasks.
We show that domain adaptive pre-training is only helpful with low-resource downstream tasks.
As an additional result, we release LegalRoBERTa, a RoBERTa model further pre-trained on legal corpora.
arXiv Detail & Related papers (2021-09-14T17:53:55Z) - On the Effectiveness of Adapter-based Tuning for Pretrained Language
Model Adaptation [36.37565646597464]
adapter-based tuning works by adding light-weight adapter modules to a pretrained language model (PrLM)
It adds only a few trainable parameters per new task, allowing a high degree of parameter sharing.
We demonstrate that adapter-based tuning outperforms fine-tuning on low-resource and cross-lingual tasks.
arXiv Detail & Related papers (2021-06-06T16:10:12Z) - Parameter-Efficient Transfer from Sequential Behaviors for User Modeling
and Recommendation [111.44445634272235]
In this paper, we develop a parameter efficient transfer learning architecture, termed as PeterRec.
PeterRec allows the pre-trained parameters to remain unaltered during fine-tuning by injecting a series of re-learned neural networks.
We perform extensive experimental ablation to show the effectiveness of the learned user representation in five downstream tasks.
arXiv Detail & Related papers (2020-01-13T14:09:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.