Emergent Communication Pretraining for Few-Shot Machine Translation
- URL: http://arxiv.org/abs/2011.00890v1
- Date: Mon, 2 Nov 2020 10:57:53 GMT
- Title: Emergent Communication Pretraining for Few-Shot Machine Translation
- Authors: Yaoyiran Li, Edoardo M. Ponti, Ivan Vuli\'c and Anna Korhonen
- Abstract summary: We pretrain neural networks via emergent communication from referential games.
Our key assumption is that grounding communication on images---as a crude approximation of real-world environments---inductively biases the model towards learning natural languages.
- Score: 66.48990742411033
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: While state-of-the-art models that rely upon massively multilingual
pretrained encoders achieve sample efficiency in downstream applications, they
still require abundant amounts of unlabelled text. Nevertheless, most of the
world's languages lack such resources. Hence, we investigate a more radical
form of unsupervised knowledge transfer in the absence of linguistic data. In
particular, for the first time we pretrain neural networks via emergent
communication from referential games. Our key assumption is that grounding
communication on images---as a crude approximation of real-world
environments---inductively biases the model towards learning natural languages.
On the one hand, we show that this substantially benefits machine translation
in few-shot settings. On the other hand, this also provides an extrinsic
evaluation protocol to probe the properties of emergent languages ex vitro.
Intuitively, the closer they are to natural languages, the higher the gains
from pretraining on them should be. For instance, in this work we measure the
influence of communication success and maximum sequence length on downstream
performances. Finally, we introduce a customised adapter layer and annealing
strategies for the regulariser of maximum-a-posteriori inference during
fine-tuning. These turn out to be crucial to facilitate knowledge transfer and
prevent catastrophic forgetting. Compared to a recurrent baseline, our method
yields gains of $59.0\%$$\sim$$147.6\%$ in BLEU score with only $500$ NMT
training instances and $65.1\%$$\sim$$196.7\%$ with $1,000$ NMT training
instances across four language pairs. These proof-of-concept results reveal the
potential of emergent communication pretraining for both natural language
processing tasks in resource-poor settings and extrinsic evaluation of
artificial languages.
Related papers
- DEPT: Decoupled Embeddings for Pre-training Language Models [16.84502158672086]
DEPT enables training without a shared global vocabulary.
We demonstrate DEPT's potential via the first vocabulary-agnostic federated multilingual pre-training of a 1.3 billion- parameter model.
arXiv Detail & Related papers (2024-10-07T13:24:24Z) - Cross-Lingual Transfer Learning for Phrase Break Prediction with
Multilingual Language Model [13.730152819942445]
Cross-lingual transfer learning can be particularly effective for improving performance in low-resource languages.
This suggests that cross-lingual transfer can be inexpensive and effective for developing TTS front-end in resource-poor languages.
arXiv Detail & Related papers (2023-06-05T04:10:04Z) - Linking Emergent and Natural Languages via Corpus Transfer [98.98724497178247]
We propose a novel way to establish a link by corpus transfer between emergent languages and natural languages.
Our approach showcases non-trivial transfer benefits for two different tasks -- language modeling and image captioning.
We also introduce a novel metric to predict the transferability of an emergent language by translating emergent messages to natural language captions grounded on the same images.
arXiv Detail & Related papers (2022-03-24T21:24:54Z) - Towards Zero-shot Language Modeling [90.80124496312274]
We construct a neural model that is inductively biased towards learning human languages.
We infer this distribution from a sample of typologically diverse training languages.
We harness additional language-specific side information as distant supervision for held-out languages.
arXiv Detail & Related papers (2021-08-06T23:49:18Z) - Exploring Unsupervised Pretraining Objectives for Machine Translation [99.5441395624651]
Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT)
Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder.
We compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context.
arXiv Detail & Related papers (2021-06-10T10:18:23Z) - Analysing The Impact Of Linguistic Features On Cross-Lingual Transfer [3.299672391663527]
We analyze a state-of-the-art multilingual model and try to determine what impacts good transfer between languages.
We show that looking at particular syntactic features is 2-4 times more helpful in predicting the performance than an aggregated syntactic similarity.
arXiv Detail & Related papers (2021-05-12T21:22:58Z) - Pre-training Multilingual Neural Machine Translation by Leveraging
Alignment Information [72.2412707779571]
mRASP is an approach to pre-train a universal multilingual neural machine translation model.
We carry out experiments on 42 translation directions across a diverse setting, including low, medium, rich resource, and as well as transferring to exotic language pairs.
arXiv Detail & Related papers (2020-10-07T03:57:54Z) - Reusing a Pretrained Language Model on Languages with Limited Corpora
for Unsupervised NMT [129.99918589405675]
We present an effective approach that reuses an LM that is pretrained only on the high-resource language.
The monolingual LM is fine-tuned on both languages and is then used to initialize a UNMT model.
Our approach, RE-LM, outperforms a competitive cross-lingual pretraining model (XLM) in English-Macedonian (En-Mk) and English-Albanian (En-Sq)
arXiv Detail & Related papers (2020-09-16T11:37:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.