Label-Free Multi-Domain Machine Translation with Stage-wise Training
- URL: http://arxiv.org/abs/2305.03949v1
- Date: Sat, 6 May 2023 06:30:29 GMT
- Title: Label-Free Multi-Domain Machine Translation with Stage-wise Training
- Authors: Fan Zhang, Mei Tu, Sangha Kim, Song Liu, Jinyao Yan
- Abstract summary: We propose a label-free multi-domain machine translation model which requires only a few or no domain-annotated data in training and no domain labels in inference.
Our model is composed of three parts: a backbone model, a domain discriminator taking responsibility to discriminate data from different domains, and a set of experts that transfer the decoded features from generic to specific.
- Score: 13.144729358707206
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Most multi-domain machine translation models rely on domain-annotated data.
Unfortunately, domain labels are usually unavailable in both training processes
and real translation scenarios. In this work, we propose a label-free
multi-domain machine translation model which requires only a few or no
domain-annotated data in training and no domain labels in inference. Our model
is composed of three parts: a backbone model, a domain discriminator taking
responsibility to discriminate data from different domains, and a set of
experts that transfer the decoded features from generic to specific. We design
a stage-wise training strategy and train the three parts sequentially. To
leverage the extra domain knowledge and improve the training stability, in the
discriminator training stage, domain differences are modeled explicitly with
clustering and distilled into the discriminator through a multi-classification
task. Meanwhile, the Gumbel-Max sampling is adopted as the routing scheme in
the expert training stage to achieve the balance of each expert in
specialization and generalization. Experimental results on the
German-to-English translation task show that our model significantly improves
BLEU scores on six different domains and even outperforms most of the models
trained with domain-annotated data.
Related papers
- A Unified Data Augmentation Framework for Low-Resource Multi-Domain Dialogue Generation [52.0964459842176]
Current state-of-the-art dialogue systems heavily rely on extensive training datasets.
We propose a novel data textbfAugmentation framework for textbfMulti-textbfDomain textbfDialogue textbfGeneration, referred to as textbfAMD$2$G.
The AMD$2$G framework consists of a data augmentation process and a two-stage training approach: domain-agnostic training and domain adaptation training.
arXiv Detail & Related papers (2024-06-14T09:52:27Z) - Mastering Text, Code and Math Simultaneously via Fusing Highly Specialized Language Models [93.92762966380793]
Large language models (LLMs) strive to achieve high performance across all three domains simultaneously.
In this paper, we propose to fuse models that are already highly-specialized directly.
The proposed fusing framework, UltraFuser, consists of three distinct specialists that are already sufficiently trained on language, coding, and mathematics.
arXiv Detail & Related papers (2024-03-13T06:18:48Z) - MultiMatch: Multi-task Learning for Semi-supervised Domain Generalization [55.06956781674986]
We resort to solving the semi-supervised domain generalization task, where there are a few label information in each source domain.
We propose MultiMatch, extending FixMatch to the multi-task learning framework, producing the high-quality pseudo-label for SSDG.
A series of experiments validate the effectiveness of the proposed method, and it outperforms the existing semi-supervised methods and the SSDG method on several benchmark DG datasets.
arXiv Detail & Related papers (2022-08-11T14:44:33Z) - Domain Generalization via Gradient Surgery [5.38147998080533]
In real-life applications, machine learning models often face scenarios where there is a change in data distribution between training and test domains.
In this work, we characterize the conflicting gradients emerging in domain shift scenarios and devise novel gradient agreement strategies.
arXiv Detail & Related papers (2021-08-03T16:49:25Z) - Robust wav2vec 2.0: Analyzing Domain Shift in Self-Supervised
Pre-Training [67.71228426496013]
We show that using target domain data during pre-training leads to large performance improvements across a variety of setups.
We find that pre-training on multiple domains improves performance generalization on domains not seen during training.
arXiv Detail & Related papers (2021-04-02T12:53:15Z) - Curriculum CycleGAN for Textual Sentiment Domain Adaptation with
Multiple Sources [68.31273535702256]
We propose a novel instance-level MDA framework, named curriculum cycle-consistent generative adversarial network (C-CycleGAN)
C-CycleGAN consists of three components: (1) pre-trained text encoder which encodes textual input from different domains into a continuous representation space, (2) intermediate domain generator with curriculum instance-level adaptation which bridges the gap across source and target domains, and (3) task classifier trained on the intermediate domain for final sentiment classification.
We conduct extensive experiments on three benchmark datasets and achieve substantial gains over state-of-the-art DA approaches.
arXiv Detail & Related papers (2020-11-17T14:50:55Z) - Target Conditioning for One-to-Many Generation [30.402378832810697]
We propose to explicitly model this one-to-many mapping by conditioning the decoder of a NMT model on a latent variable that represents the domain of target sentences.
At inference, we can generate diverse translations by decoding with different domains.
We assess the quality and diversity of translations generated by our model with several metrics, on three different datasets.
arXiv Detail & Related papers (2020-09-21T11:01:14Z) - Unsupervised Domain Clusters in Pretrained Language Models [61.832234606157286]
We show that massive pre-trained language models implicitly learn sentence representations that cluster by domains without supervision.
We propose domain data selection methods based on such models.
We evaluate our data selection methods for neural machine translation across five diverse domains.
arXiv Detail & Related papers (2020-04-05T06:22:16Z) - Dual Adversarial Domain Adaptation [6.69797982848003]
Unsupervised domain adaptation aims at transferring knowledge from the labeled source domain to the unlabeled target domain.
Recent experiments have shown that when the discriminator is provided with domain information in both domains, it is able to preserve the complex multimodal information.
We adopt a discriminator with $2K$-dimensional output to perform both domain-level and class-level alignments simultaneously in a single discriminator.
arXiv Detail & Related papers (2020-01-01T07:10:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.