PADA: Pruning Assisted Domain Adaptation for Self-Supervised Speech
Representations
- URL: http://arxiv.org/abs/2203.16965v4
- Date: Sat, 13 May 2023 21:18:45 GMT
- Title: PADA: Pruning Assisted Domain Adaptation for Self-Supervised Speech
Representations
- Authors: Lodagala V S V Durga Prasad and Sreyan Ghosh and S. Umesh
- Abstract summary: We propose PADA (Pruning Assisted Domain Adaptation) and zero out redundant weights from models pre-trained on large amounts of out-of-domain (OOD) data.
The redundant weights can be identified through various pruning strategies which have been discussed in detail as a part of this work.
Specifically, we investigate the effect of the recently discovered Task-Agnostic and Task-Aware pruning on PADA and propose a new pruning paradigm based on the latter.
Our proposed CD-TAW methodology achieves up to 20.6% relative WER improvement over our baseline when fine-tuned on a 2-hour subset of Switch
- Score: 1.2031796234206138
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While self-supervised speech representation learning (SSL) models serve a
variety of downstream tasks, these models have been observed to overfit to the
domain from which the unlabelled data originates. To alleviate this issue, we
propose PADA (Pruning Assisted Domain Adaptation) and zero out redundant
weights from models pre-trained on large amounts of out-of-domain (OOD) data.
Intuitively, this helps to make space for the target-domain ASR finetuning. The
redundant weights can be identified through various pruning strategies which
have been discussed in detail as a part of this work. Specifically, we
investigate the effect of the recently discovered Task-Agnostic and Task-Aware
pruning on PADA and propose a new pruning paradigm based on the latter, which
we call Cross-Domain Task-Aware Pruning (CD-TAW). CD-TAW obtains the initial
pruning mask from a well fine-tuned OOD model, which makes it starkly different
from the rest of the pruning strategies discussed in the paper. Our proposed
CD-TAW methodology achieves up to 20.6% relative WER improvement over our
baseline when fine-tuned on a 2-hour subset of Switchboard data without
language model (LM) decoding. Furthermore, we conduct a detailed analysis to
highlight the key design choices of our proposed method.
Related papers
- A Bayesian Approach to Data Point Selection [24.98069363998565]
Data point selection (DPS) is becoming a critical topic in deep learning.
Existing approaches to DPS are predominantly based on a bi-level optimisation (BLO) formulation.
We propose a novel Bayesian approach to DPS.
arXiv Detail & Related papers (2024-11-06T09:04:13Z) - MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining [73.81862342673894]
Foundation models have reshaped the landscape of Remote Sensing (RS) by enhancing various image interpretation tasks.
transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks.
We conduct multi-task supervised pretraining on the SAMRS dataset, encompassing semantic segmentation, instance segmentation, and rotated object detection.
Our models are finetuned on various RS downstream tasks, such as scene classification, horizontal and rotated object detection, semantic segmentation, and change detection.
arXiv Detail & Related papers (2024-03-20T09:17:22Z) - Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large
Language Models [46.92994945808424]
Catastrophic forgetting emerges as a critical challenge when fine-tuning multi-modal large language models (MLLMs)
This paper presents a comprehensive analysis of catastrophic forgetting in MLLMs and introduces a post-training adjustment method called Model Tailor.
arXiv Detail & Related papers (2024-02-19T11:02:05Z) - A New Learning Paradigm for Foundation Model-based Remote Sensing Change
Detection [54.01158175996638]
Change detection (CD) is a critical task to observe and analyze dynamic processes of land cover.
We propose a Bi-Temporal Adapter Network (BAN), which is a universal foundation model-based CD adaptation framework.
arXiv Detail & Related papers (2023-12-02T15:57:17Z) - Voting from Nearest Tasks: Meta-Vote Pruning of Pre-trained Models for
Downstream Tasks [55.431048995662714]
We create a small model for a new task from the pruned models of similar tasks.
We show that a few fine-tuning steps on this model suffice to produce a promising pruned-model for the new task.
We develop a simple but effective ''Meta-Vote Pruning (MVP)'' method that significantly reduces the pruning iterations for a new task.
arXiv Detail & Related papers (2023-01-27T06:49:47Z) - Domain Adaptive Person Search [20.442648584402917]
We present Domain Adaptive Person Search (DAPS), which aims to generalize the model from a labeled source domain to the unlabeled target domain.
We show that our framework achieves 34.7% in mAP and 80.6% in top-1 on PRW dataset.
arXiv Detail & Related papers (2022-07-25T04:02:39Z) - Plug-and-Play Few-shot Object Detection with Meta Strategy and Explicit
Localization Inference [78.41932738265345]
This paper proposes a plug detector that can accurately detect the objects of novel categories without fine-tuning process.
We introduce two explicit inferences into the localization process to reduce its dependence on annotated data.
It shows a significant lead in both efficiency, precision, and recall under varied evaluation protocols.
arXiv Detail & Related papers (2021-10-26T03:09:57Z) - Enhancing the Generalization for Intent Classification and Out-of-Domain
Detection in SLU [70.44344060176952]
Intent classification is a major task in spoken language understanding (SLU)
Recent works have shown that using extra data and labels can improve the OOD detection performance.
This paper proposes to train a model with only IND data while supporting both IND intent classification and OOD detection.
arXiv Detail & Related papers (2021-06-28T08:27:38Z) - DGSAC: Density Guided Sampling and Consensus [4.808421423598809]
Kernel Residual Density is a key differentiator between inliers and outliers.
We propose two model selection algorithms, an optimal quadratic program based, and a greedy.
We evaluate our method on a wide variety of tasks like planar segmentation, motion segmentation, vanishing point estimation, plane fitting to 3D point cloud, line, and circle fitting.
arXiv Detail & Related papers (2020-06-03T17:42:53Z) - Recent Developments Combining Ensemble Smoother and Deep Generative
Networks for Facies History Matching [58.720142291102135]
This research project focuses on the use of autoencoders networks to construct a continuous parameterization for facies models.
We benchmark seven different formulations, including VAE, generative adversarial network (GAN), Wasserstein GAN, variational auto-encoding GAN, principal component analysis (PCA) with cycle GAN, PCA with transfer style network and VAE with style loss.
arXiv Detail & Related papers (2020-05-08T21:32:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.