Resource-Efficient Transfer Learning From Speech Foundation Model Using
Hierarchical Feature Fusion
- URL: http://arxiv.org/abs/2211.02712v1
- Date: Fri, 4 Nov 2022 19:03:45 GMT
- Title: Resource-Efficient Transfer Learning From Speech Foundation Model Using
Hierarchical Feature Fusion
- Authors: Zhouyuan Huo, Khe Chai Sim, Bo Li, Dongseong Hwang, Tara N. Sainath,
Trevor Strohman
- Abstract summary: We propose a novel hierarchical feature fusion method for resource-efficient transfer learning from speech foundation models.
Experimental results show that the proposed method can achieve better performance on speech recognition task than existing algorithms.
- Score: 44.056153052137674
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervised pre-training of a speech foundation model, followed by
supervised fine-tuning, has shown impressive quality improvements on automatic
speech recognition (ASR) tasks. Fine-tuning separate foundation models for many
downstream tasks are expensive since the foundation model is usually very big.
Parameter-efficient fine-tuning methods (e.g. adapter, sparse update methods)
offer an alternative paradigm where a small set of parameters are updated to
adapt the foundation model to new tasks. However, these methods still suffer
from a high computational memory cost and slow training speed because they
require backpropagation through the entire neural network at each step. In the
paper, we analyze the performance of features at different layers of a
foundation model on the speech recognition task and propose a novel
hierarchical feature fusion method for resource-efficient transfer learning
from speech foundation models. Experimental results show that the proposed
method can achieve better performance on speech recognition task than existing
algorithms with fewer number of trainable parameters, less computational memory
cost and faster training speed. After combining with Adapters at all layers,
the proposed method can achieve the same performance as fine-tuning the whole
model with $97\%$ fewer trainable encoder parameters and $53\%$ faster training
speed.
Related papers
- Time-, Memory- and Parameter-Efficient Visual Adaptation [75.28557015773217]
We propose an adaptation method which does not backpropagate gradients through the backbone.
We achieve this by designing a lightweight network in parallel that operates on features from the frozen, pretrained backbone.
arXiv Detail & Related papers (2024-02-05T10:55:47Z) - Parameter-efficient Tuning of Large-scale Multimodal Foundation Model [68.24510810095802]
We propose A graceful prompt framework for cross-modal transfer (Aurora) to overcome these challenges.
Considering the redundancy in existing architectures, we first utilize the mode approximation to generate 0.1M trainable parameters to implement the multimodal prompt tuning.
A thorough evaluation on six cross-modal benchmarks shows that it not only outperforms the state-of-the-art but even outperforms the full fine-tuning approach.
arXiv Detail & Related papers (2023-05-15T06:40:56Z) - $\Delta$-Patching: A Framework for Rapid Adaptation of Pre-trained
Convolutional Networks without Base Performance Loss [71.46601663956521]
Models pre-trained on large-scale datasets are often fine-tuned to support newer tasks and datasets that arrive over time.
We propose $Delta$-Patching for fine-tuning neural network models in an efficient manner, without the need to store model copies.
Our experiments show that $Delta$-Networks outperform earlier model patching work while only requiring a fraction of parameters to be trained.
arXiv Detail & Related papers (2023-03-26T16:39:44Z) - Evaluating Parameter-Efficient Transfer Learning Approaches on SURE
Benchmark for Speech Understanding [40.27182770995891]
Fine-tuning is widely used as the default algorithm for transfer learning from pre-trained models.
We introduce the Speech UndeRstanding Evaluation (SURE) benchmark for parameter-efficient learning for various speech-processing tasks.
arXiv Detail & Related papers (2023-03-02T08:57:33Z) - Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than
In-Context Learning [81.3514358542452]
Few-shot in-context learning (ICL) incurs substantial computational, memory, and storage costs because it involves processing all of the training examples every time a prediction is made.
parameter-efficient fine-tuning offers an alternative paradigm where a small set of parameters are trained to enable a model to perform the new task.
In this paper, we rigorously compare few-shot ICL and parameter-efficient fine-tuning and demonstrate that the latter offers better accuracy as well as dramatically lower computational costs.
arXiv Detail & Related papers (2022-05-11T17:10:41Z) - Training Neural Networks with Fixed Sparse Masks [19.58969772430058]
Recent work has shown that it is possible to update only a small subset of the model's parameters during training.
We show that it is possible to induce a fixed sparse mask on the model's parameters that selects a subset to update over many iterations.
arXiv Detail & Related papers (2021-11-18T18:06:01Z) - The Right Tool for the Job: Matching Model and Instance Complexities [62.95183777679024]
As NLP models become larger, executing a trained model requires significant computational resources incurring monetary and environmental costs.
We propose a modification to contextual representation fine-tuning which, during inference, allows for an early (and fast) "exit"
We test our proposed modification on five different datasets in two tasks: three text classification datasets and two natural language inference benchmarks.
arXiv Detail & Related papers (2020-04-16T04:28:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.