How to Estimate Model Transferability of Pre-Trained Speech Models?
- URL: http://arxiv.org/abs/2306.01015v3
- Date: Tue, 6 Feb 2024 03:52:48 GMT
- Title: How to Estimate Model Transferability of Pre-Trained Speech Models?
- Authors: Zih-Ching Chen, Chao-Han Huck Yang, Bo Li, Yu Zhang, Nanxin Chen,
Shuo-Yiin Chang, Rohit Prabhavalkar, Hung-yi Lee, Tara N. Sainath
- Abstract summary: "Score-based assessment" framework for estimating transferability of pre-trained speech models.
We leverage upon two representation theories, Bayesian likelihood estimation and optimal transport, to generate rank scores for the PSM candidates.
Our framework efficiently computes transferability scores without actual fine-tuning of candidate models or layers.
- Score: 84.11085139766108
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work, we introduce a "score-based assessment" framework for
estimating the transferability of pre-trained speech models (PSMs) for
fine-tuning target tasks. We leverage upon two representation theories,
Bayesian likelihood estimation and optimal transport, to generate rank scores
for the PSM candidates using the extracted representations. Our framework
efficiently computes transferability scores without actual fine-tuning of
candidate models or layers by making a temporal independent hypothesis. We
evaluate some popular supervised speech models (e.g., Conformer RNN-Transducer)
and self-supervised speech models (e.g., HuBERT) in cross-layer and cross-model
settings using public data. Experimental results show a high Spearman's rank
correlation and low $p$-value between our estimation framework and fine-tuning
ground truth. Our proposed transferability framework requires less
computational time and resources, making it a resource-saving and
time-efficient approach for tuning speech foundation models.
Related papers
- A Large-Scale Evaluation of Speech Foundation Models [110.95827399522204]
We establish the Speech processing Universal PERformance Benchmark (SUPERB) to study the effectiveness of the foundation model paradigm for speech.
We propose a unified multi-tasking framework to address speech processing tasks in SUPERB using a frozen foundation model followed by task-specialized, lightweight prediction heads.
arXiv Detail & Related papers (2024-04-15T00:03:16Z) - Lipsum-FT: Robust Fine-Tuning of Zero-Shot Models Using Random Text Guidance [27.91782770050068]
Large-scale contrastive vision-language pre-trained models provide the zero-shot model achieving competitive performance across a range of image classification tasks without requiring training on downstream data.
Recent works have confirmed that additional fine-tuning of the zero-shot model on the reference data results in enhanced downstream performance, but compromises the model's robustness against distribution shifts.
We propose a novel robust fine-tuning algorithm, Lipsum-FT, that effectively utilizes the language modeling aspect of the vision-language pre-trained models.
arXiv Detail & Related papers (2024-04-01T02:01:33Z) - Building a Winning Team: Selecting Source Model Ensembles using a
Submodular Transferability Estimation Approach [20.86345962679122]
Estimating the transferability of publicly available pretrained models to a target task has assumed an important place for transfer learning tasks.
We propose a novel Optimal tranSport-based suBmOdular tRaNsferability metric (OSBORN) to estimate the transferability of an ensemble of models to a downstream task.
arXiv Detail & Related papers (2023-09-05T17:57:31Z) - SynBench: Task-Agnostic Benchmarking of Pretrained Representations using
Synthetic Data [78.21197488065177]
Recent success in fine-tuning large models, that are pretrained on broad data at scale, on downstream tasks has led to a significant paradigm shift in deep learning.
This paper proposes a new task-agnostic framework, textitSynBench, to measure the quality of pretrained representations using synthetic data.
arXiv Detail & Related papers (2022-10-06T15:25:00Z) - LDNet: Unified Listener Dependent Modeling in MOS Prediction for
Synthetic Speech [67.88748572167309]
We present LDNet, a unified framework for mean opinion score (MOS) prediction.
We propose two inference methods that provide more stable results and efficient computation.
arXiv Detail & Related papers (2021-10-18T08:52:31Z) - Predictive and Prescriptive Performance of Bike-Sharing Demand Forecasts
for Inventory Management [8.441020454345932]
We introduce a variational Poisson recurrent neural network model (VP-RNN) to forecast future pickup and return rates.
We empirically evaluate our approach against both traditional and learning-based forecasting methods on real trip travel data from the city of New York, USA.
arXiv Detail & Related papers (2021-07-28T14:11:34Z) - Model-Based Counterfactual Synthesizer for Interpretation [40.01787107375103]
We propose a Model-based Counterfactual Synthesizer (MCS) framework for interpreting machine learning models.
We first analyze the model-based counterfactual process and construct a base synthesizer using a conditional generative adversarial net (CGAN)
To better approximate the counterfactual universe for those rare queries, we novelly employ the umbrella sampling technique to conduct the MCS framework training.
arXiv Detail & Related papers (2021-06-16T17:09:57Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.