Automatic Data Augmentation for Domain Adapted Fine-Tuning of
Self-Supervised Speech Representations
- URL: http://arxiv.org/abs/2306.00481v1
- Date: Thu, 1 Jun 2023 09:30:49 GMT
- Title: Automatic Data Augmentation for Domain Adapted Fine-Tuning of
Self-Supervised Speech Representations
- Authors: Salah Zaiem, Titouan Parcollet, Slim Essid
- Abstract summary: Self-Supervised Learning (SSL) has allowed leveraging large amounts of unlabeled speech data to improve the performance of speech recognition models.
Despite this, speech SSL representations may fail while facing an acoustic mismatch between the pretraining and target datasets.
We propose a novel supervised domain adaptation method, designed for cases exhibiting such a mismatch in acoustic domains.
- Score: 21.423349835589793
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-Supervised Learning (SSL) has allowed leveraging large amounts of
unlabeled speech data to improve the performance of speech recognition models
even with small annotated datasets. Despite this, speech SSL representations
may fail while facing an acoustic mismatch between the pretraining and target
datasets. To address this issue, we propose a novel supervised domain
adaptation method, designed for cases exhibiting such a mismatch in acoustic
domains. It consists in applying properly calibrated data augmentations on a
large clean dataset, bringing it closer to the target domain, and using it as
part of an initial fine-tuning stage. Augmentations are automatically selected
through the minimization of a conditional-dependence estimator, based on the
target dataset. The approach is validated during an oracle experiment with
controlled distortions and on two amateur-collected low-resource domains,
reaching better performances compared to the baselines in both cases.
Related papers
- Progressive Multi-Level Alignments for Semi-Supervised Domain Adaptation SAR Target Recognition Using Simulated Data [3.1951121258423334]
We develop an instance-prototype alignment (AIPA) strategy to push the source domain instances close to the corresponding target prototypes.
We also develop an instance-prototype alignment (AIPA) strategy to push the source domain instances close to the corresponding target prototypes.
arXiv Detail & Related papers (2024-11-07T13:53:13Z) - Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models [84.8919069953397]
Self-TAught Recognizer (STAR) is an unsupervised adaptation framework for speech recognition systems.
We show that STAR achieves an average of 13.5% relative reduction in word error rate across 14 target domains.
STAR exhibits high data efficiency that only requires less than one-hour unlabeled data.
arXiv Detail & Related papers (2024-05-23T04:27:11Z) - Self-Supervised Dataset Distillation for Transfer Learning [77.4714995131992]
We propose a novel problem of distilling an unlabeled dataset into a set of small synthetic samples for efficient self-supervised learning (SSL)
We first prove that a gradient of synthetic samples with respect to a SSL objective in naive bilevel optimization is textitbiased due to randomness originating from data augmentations or masking.
We empirically validate the effectiveness of our method on various applications involving transfer learning.
arXiv Detail & Related papers (2023-10-10T10:48:52Z) - Unsupervised Noise adaptation using Data Simulation [21.866522173387715]
We propose a generative adversarial network based method to efficiently learn a converse clean-to-noisy transformation.
Experimental results show that our method effectively mitigates the domain mismatch between training and test sets.
arXiv Detail & Related papers (2023-02-23T12:57:20Z) - MAPS: A Noise-Robust Progressive Learning Approach for Source-Free
Domain Adaptive Keypoint Detection [76.97324120775475]
Cross-domain keypoint detection methods always require accessing the source data during adaptation.
This paper considers source-free domain adaptive keypoint detection, where only the well-trained source model is provided to the target domain.
arXiv Detail & Related papers (2023-02-09T12:06:08Z) - Cross-domain Voice Activity Detection with Self-Supervised
Representations [9.02236667251654]
Voice Activity Detection (VAD) aims at detecting speech segments on an audio signal.
Current state-of-the-art methods focus on training a neural network exploiting features directly contained in the acoustics.
We show that representations based on Self-Supervised Learning (SSL) can adapt well to different domains.
arXiv Detail & Related papers (2022-09-22T14:53:44Z) - Boosting Cross-Domain Speech Recognition with Self-Supervision [35.01508881708751]
Cross-domain performance of automatic speech recognition (ASR) could be severely hampered due to mismatch between training and testing distributions.
Previous work has shown that self-supervised learning (SSL) or pseudo-labeling (PL) is effective in UDA by exploiting the self-supervisions of unlabeled data.
This work presents a systematic UDA framework to fully utilize the unlabeled data with self-supervision in the pre-training and fine-tuning paradigm.
arXiv Detail & Related papers (2022-06-20T14:02:53Z) - Listen, Adapt, Better WER: Source-free Single-utterance Test-time
Adaptation for Automatic Speech Recognition [65.84978547406753]
Test-time Adaptation aims to adapt the model trained on source domains to yield better predictions for test samples.
Single-Utterance Test-time Adaptation (SUTA) is the first TTA study in speech area to our best knowledge.
arXiv Detail & Related papers (2022-03-27T06:38:39Z) - Instance Level Affinity-Based Transfer for Unsupervised Domain
Adaptation [74.71931918541748]
We propose an instance affinity based criterion for source to target transfer during adaptation, called ILA-DA.
We first propose a reliable and efficient method to extract similar and dissimilar samples across source and target, and utilize a multi-sample contrastive loss to drive the domain alignment process.
We verify the effectiveness of ILA-DA by observing consistent improvements in accuracy over popular domain adaptation approaches on a variety of benchmark datasets.
arXiv Detail & Related papers (2021-04-03T01:33:14Z) - Unsupervised Domain Adaptation for Acoustic Scene Classification Using
Band-Wise Statistics Matching [69.24460241328521]
Machine learning algorithms can be negatively affected by mismatches between training (source) and test (target) data distributions.
We propose an unsupervised domain adaptation method that consists of aligning the first- and second-order sample statistics of each frequency band of target-domain acoustic scenes to the ones of the source-domain training dataset.
We show that the proposed method outperforms the state-of-the-art unsupervised methods found in the literature in terms of both source- and target-domain classification accuracy.
arXiv Detail & Related papers (2020-04-30T23:56:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.