Globetrotter: Unsupervised Multilingual Translation from Visual
Alignment
- URL: http://arxiv.org/abs/2012.04631v1
- Date: Tue, 8 Dec 2020 18:50:40 GMT
- Title: Globetrotter: Unsupervised Multilingual Translation from Visual
Alignment
- Authors: D\'idac Sur\'is, Dave Epstein, Carl Vondrick
- Abstract summary: We introduce a framework that uses the visual modality to align multiple languages.
We estimate the cross-modal alignment between language and images, and use this estimate to guide the learning of cross-lingual representations.
Our language representations are trained jointly in one model with a single stage.
- Score: 24.44204156935044
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-language machine translation without parallel corpora is challenging
because there is no explicit supervision between languages. Existing
unsupervised methods typically rely on topological properties of the language
representations. We introduce a framework that instead uses the visual modality
to align multiple languages, using images as the bridge between them. We
estimate the cross-modal alignment between language and images, and use this
estimate to guide the learning of cross-lingual representations. Our language
representations are trained jointly in one model with a single stage.
Experiments with fifty-two languages show that our method outperforms baselines
on unsupervised word-level and sentence-level translation using retrieval.
Related papers
- Discovering Representation Sprachbund For Multilingual Pre-Training [139.05668687865688]
We generate language representation from multilingual pre-trained models and conduct linguistic analysis.
We cluster all the target languages into multiple groups and name each group as a representation sprachbund.
Experiments are conducted on cross-lingual benchmarks and significant improvements are achieved compared to strong baselines.
arXiv Detail & Related papers (2021-09-01T09:32:06Z) - Bilingual alignment transfers to multilingual alignment for unsupervised
parallel text mining [3.4519649635864584]
This work presents methods for learning cross-lingual sentence representations using paired or unpaired bilingual texts.
We hypothesize that the cross-lingual alignment strategy is transferable, and therefore a model trained to align only two languages can encode multilingually more aligned representations.
arXiv Detail & Related papers (2021-04-15T17:51:22Z) - UC2: Universal Cross-lingual Cross-modal Vision-and-Language
Pre-training [52.852163987208826]
UC2 is the first machine translation-augmented framework for cross-lingual cross-modal representation learning.
We propose two novel pre-training tasks, namely Masked Region-to-Token Modeling (MRTM) and Visual Translation Language Modeling (VTLM)
Our proposed framework achieves new state-of-the-art on diverse non-English benchmarks while maintaining comparable performance to monolingual pre-trained models on English tasks.
arXiv Detail & Related papers (2021-04-01T08:30:53Z) - Vokenization: Improving Language Understanding with Contextualized,
Visual-Grounded Supervision [110.66085917826648]
We develop a technique that extrapolates multimodal alignments to language-only data by contextually mapping language tokens to their related images.
"vokenization" is trained on relatively small image captioning datasets and we then apply it to generate vokens for large language corpora.
Trained with these contextually generated vokens, our visually-supervised language models show consistent improvements over self-supervised alternatives on multiple pure-language tasks.
arXiv Detail & Related papers (2020-10-14T02:11:51Z) - InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language
Model Pre-Training [135.12061144759517]
We present an information-theoretic framework that formulates cross-lingual language model pre-training.
We propose a new pre-training task based on contrastive learning.
By leveraging both monolingual and parallel corpora, we jointly train the pretext to improve the cross-lingual transferability of pre-trained models.
arXiv Detail & Related papers (2020-07-15T16:58:01Z) - On the Language Neutrality of Pre-trained Multilingual Representations [70.93503607755055]
We investigate the language-neutrality of multilingual contextual embeddings directly and with respect to lexical semantics.
Our results show that contextual embeddings are more language-neutral and, in general, more informative than aligned static word-type embeddings.
We show how to reach state-of-the-art accuracy on language identification and match the performance of statistical methods for word alignment of parallel sentences.
arXiv Detail & Related papers (2020-04-09T19:50:32Z) - Visual Grounding in Video for Unsupervised Word Translation [91.47607488740647]
We use visual grounding to improve unsupervised word mapping between languages.
We learn embeddings from unpaired instructional videos narrated in the native language.
We apply these methods to translate words from English to French, Korean, and Japanese.
arXiv Detail & Related papers (2020-03-11T02:03:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.