Corpus Synthesis for Zero-shot ASR domain Adaptation using Large
Language Models
- URL: http://arxiv.org/abs/2309.10707v1
- Date: Mon, 18 Sep 2023 15:43:08 GMT
- Title: Corpus Synthesis for Zero-shot ASR domain Adaptation using Large
Language Models
- Authors: Hsuan Su, Ting-Yao Hu, Hema Swetha Koppula, Raviteja Vemulapalli,
Jen-Hao Rick Chang, Karren Yang, Gautam Varma Mantena, Oncel Tuzel
- Abstract summary: We propose a new strategy for adapting ASR models to new target domains without any text or speech from those domains.
Experiments on the SLURP dataset show that the proposed method achieves an average relative word error rate improvement of $28%$ on unseen target domains.
- Score: 19.726699481313194
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: While Automatic Speech Recognition (ASR) systems are widely used in many
real-world applications, they often do not generalize well to new domains and
need to be finetuned on data from these domains. However, target-domain data
usually are not readily available in many scenarios. In this paper, we propose
a new strategy for adapting ASR models to new target domains without any text
or speech from those domains. To accomplish this, we propose a novel data
synthesis pipeline that uses a Large Language Model (LLM) to generate a target
domain text corpus, and a state-of-the-art controllable speech synthesis model
to generate the corresponding speech. We propose a simple yet effective
in-context instruction finetuning strategy to increase the effectiveness of LLM
in generating text corpora for new domains. Experiments on the SLURP dataset
show that the proposed method achieves an average relative word error rate
improvement of $28\%$ on unseen target domains without any performance drop in
source domains.
Related papers
- Improving Few-Shot Cross-Domain Named Entity Recognition by Instruction Tuning a Word-Embedding based Retrieval Augmented Large Language Model [0.0]
Few-Shot Cross-Domain NER is a process of leveraging knowledge from data-rich source domains to perform entity recognition on data scarce target domains.
We propose IF-WRANER, a retrieval augmented large language model for Named Entity Recognition.
arXiv Detail & Related papers (2024-11-01T08:57:29Z) - Schema Augmentation for Zero-Shot Domain Adaptation in Dialogue State Tracking [16.67185296899117]
Current large language model approaches for zero-shot domain adaptation rely on prompting to introduce knowledge pertaining to the target domains.
In this work, we devise a novel data augmentation approach, Augmentation, that improves the zero-shot domain adaptation of language models through fine-tuning.
Experiments on MultiWOZ and SpokenWOZ showed that the proposed approach resulted in a substantial improvement over the baseline.
arXiv Detail & Related papers (2024-10-31T18:57:59Z) - Exploring User Retrieval Integration towards Large Language Models for Cross-Domain Sequential Recommendation [66.72195610471624]
Cross-Domain Sequential Recommendation aims to mine and transfer users' sequential preferences across different domains.
We propose a novel framework named URLLM, which aims to improve the CDSR performance by exploring the User Retrieval approach.
arXiv Detail & Related papers (2024-06-05T09:19:54Z) - UniGen: Universal Domain Generalization for Sentiment Classification via Zero-shot Dataset Generation [6.3823202275924125]
We propose a novel approach to universal domain generalization that generates a dataset regardless of the target domain.
Our experiments indicate that the proposed method accomplishes generalizability across various domains while using a parameter set that is orders of magnitude smaller than PLMs.
arXiv Detail & Related papers (2024-05-02T05:46:13Z) - Unified Language-driven Zero-shot Domain Adaptation [55.64088594551629]
Unified Language-driven Zero-shot Domain Adaptation (ULDA) is a novel task setting.
It enables a single model to adapt to diverse target domains without explicit domain-ID knowledge.
arXiv Detail & Related papers (2024-04-10T16:44:11Z) - Phrase Grounding-based Style Transfer for Single-Domain Generalized
Object Detection [109.58348694132091]
Single-domain generalized object detection aims to enhance a model's generalizability to multiple unseen target domains.
This is a practical yet challenging task as it requires the model to address domain shift without incorporating target domain data into training.
We propose a novel phrase grounding-based style transfer approach for the task.
arXiv Detail & Related papers (2024-02-02T10:48:43Z) - A Simple Baseline for Domain Adaptation in End to End ASR Systems Using
Synthetic Data [1.14219428942199]
We propose a simple baseline technique for domain adaptation in end-to-end speech recognition models.
We convert the text-only corpus to audio data using single speaker Text to Speech (TTS) engine.
We show that single speaker synthetic TTS data coupled with final dense layer only fine-tuning provides reasonable improvements in word error rates.
arXiv Detail & Related papers (2022-06-22T12:07:38Z) - Domain-Agnostic Prior for Transfer Semantic Segmentation [197.9378107222422]
Unsupervised domain adaptation (UDA) is an important topic in the computer vision community.
We present a mechanism that regularizes cross-domain representation learning with a domain-agnostic prior (DAP)
Our research reveals that UDA benefits much from better proxies, possibly from other data modalities.
arXiv Detail & Related papers (2022-04-06T09:13:25Z) - Meta-Learning for Domain Generalization in Semantic Parsing [124.32975734073949]
We use a meta-learning framework which targets zero-shot domain for semantic parsing.
We apply a model-agnostic training algorithm that simulates zero-shot parsing virtual train and test sets from disjoint domains.
arXiv Detail & Related papers (2020-10-22T19:00:36Z) - Iterative Domain-Repaired Back-Translation [50.32925322697343]
In this paper, we focus on the domain-specific translation with low resources, where in-domain parallel corpora are scarce or nonexistent.
We propose a novel iterative domain-repaired back-translation framework, which introduces the Domain-Repair model to refine translations in synthetic bilingual data.
Experiments on adapting NMT models between specific domains and from the general domain to specific domains demonstrate the effectiveness of our proposed approach.
arXiv Detail & Related papers (2020-10-06T04:38:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.