Efficient Domain Adaptation for Speech Foundation Models
- URL: http://arxiv.org/abs/2302.01496v1
- Date: Fri, 3 Feb 2023 02:10:35 GMT
- Title: Efficient Domain Adaptation for Speech Foundation Models
- Authors: Bo Li, Dongseong Hwang, Zhouyuan Huo, Junwen Bai, Guru Prakash, Tara
N. Sainath, Khe Chai Sim, Yu Zhang, Wei Han, Trevor Strohman, Francoise
Beaufays
- Abstract summary: We present a pioneering study towards building an efficient solution for FM-based speech recognition systems.
We adopt the recently developed self-supervised BEST-RQ for pretraining, and propose the joint finetuning with both source and unsupervised target domain data.
On a large-scale YouTube and Voice Search task, our method is shown to be both data and model parameter efficient.
- Score: 42.81357437023811
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Foundation models (FMs), that are trained on broad data at scale and are
adaptable to a wide range of downstream tasks, have brought large interest in
the research community. Benefiting from the diverse data sources such as
different modalities, languages and application domains, foundation models have
demonstrated strong generalization and knowledge transfer capabilities. In this
paper, we present a pioneering study towards building an efficient solution for
FM-based speech recognition systems. We adopt the recently developed
self-supervised BEST-RQ for pretraining, and propose the joint finetuning with
both source and unsupervised target domain data using JUST Hydra. The FM
encoder adapter and decoder are then finetuned to the target domain with a
small amount of supervised in-domain data. On a large-scale YouTube and Voice
Search task, our method is shown to be both data and model parameter efficient.
It achieves the same quality with only 21.6M supervised in-domain data and
130.8M finetuned parameters, compared to the 731.1M model trained from scratch
on additional 300M supervised in-domain data.
Related papers
- Specialized Foundation Models Struggle to Beat Supervised Baselines [60.23386520331143]
We look at three modalities -- genomics, satellite imaging, and time series -- with multiple recent FMs and compare them to a standard supervised learning workflow.
We find that it is consistently possible to train simple supervised models that match or even outperform the latest foundation models.
arXiv Detail & Related papers (2024-11-05T04:10:59Z) - Style Adaptation for Domain-adaptive Semantic Segmentation [2.1365683052370046]
Domain discrepancy leads to a significant decrease in the performance of general network models trained on the source domain data when applied to the target domain.
We introduce a straightforward approach to mitigate the domain discrepancy, which necessitates no additional parameter calculations and seamlessly integrates with self-training-based UDA methods.
Our proposed method attains a noteworthy UDA performance of 76.93 mIoU on the GTA->Cityscapes dataset, representing a notable improvement of +1.03 percentage points over the previous state-of-the-art results.
arXiv Detail & Related papers (2024-04-25T02:51:55Z) - Pretraining Billion-scale Geospatial Foundational Models on Frontier [0.16492989697868893]
Foundation Models (FMs) are trained with internet-scale unlabeled data via self-supervised learning.
We investigate billion scale FMs and HPC training profiles for geospatial applications by pretraining on publicly available data.
Our larger 3B parameter size model achieves up to 30% improvement in top1 scene classification accuracy.
arXiv Detail & Related papers (2024-04-17T19:16:32Z) - Agile Multi-Source-Free Domain Adaptation [25.06352660046911]
Bi-level ATtention ENsemble (Bi-ATEN) module learns both intra-domain weights and inter-domain ensemble weights to achieve a fine balance between instance specificity and domain consistency.
We achieve comparable or even superior performance on a challenging benchmark DomainNet with less than 3% trained parameters and 8 times of throughput compared with SOTA method.
arXiv Detail & Related papers (2024-03-08T05:17:10Z) - Unsupervised Domain Adaption for Neural Information Retrieval [18.97486314518283]
We compare synthetic annotation by query generation using Large Language Models or rule-based string manipulation.
We find that Large Language Models outperform rule-based methods in all scenarios by a large margin.
In addition we explore several sizes of open Large Language Models to generate synthetic data and find that a medium-sized model suffices.
arXiv Detail & Related papers (2023-10-13T18:27:33Z) - Learning to Augment via Implicit Differentiation for Domain
Generalization [107.9666735637355]
Domain generalization (DG) aims to overcome the problem by leveraging multiple source domains to learn a domain-generalizable model.
In this paper, we propose a novel augmentation-based DG approach, dubbed AugLearn.
AugLearn shows effectiveness on three standard DG benchmarks, PACS, Office-Home and Digits-DG.
arXiv Detail & Related papers (2022-10-25T18:51:51Z) - Self-Supervised Pre-Training for Transformer-Based Person
Re-Identification [54.55281692768765]
Transformer-based supervised pre-training achieves great performance in person re-identification (ReID)
Due to the domain gap between ImageNet and ReID datasets, it usually needs a larger pre-training dataset to boost the performance.
This work aims to mitigate the gap between the pre-training and ReID datasets from the perspective of data and model structure.
arXiv Detail & Related papers (2021-11-23T18:59:08Z) - Unsupervised Multi-source Domain Adaptation Without Access to Source
Data [58.551861130011886]
Unsupervised Domain Adaptation (UDA) aims to learn a predictor model for an unlabeled domain by transferring knowledge from a separate labeled source domain.
We propose a novel and efficient algorithm which automatically combines the source models with suitable weights in such a way that it performs at least as good as the best source model.
arXiv Detail & Related papers (2021-04-05T10:45:12Z) - Do We Really Need to Access the Source Data? Source Hypothesis Transfer
for Unsupervised Domain Adaptation [102.67010690592011]
Unsupervised adaptationUDA (UDA) aims to leverage the knowledge learned from a labeled source dataset to solve similar tasks in a new unlabeled domain.
Prior UDA methods typically require to access the source data when learning to adapt the model.
This work tackles a practical setting where only a trained source model is available and how we can effectively utilize such a model without source data to solve UDA problems.
arXiv Detail & Related papers (2020-02-20T03:13:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.