Bilingual Dictionary Based Neural Machine Translation without Using
Parallel Sentences
- URL: http://arxiv.org/abs/2007.02671v1
- Date: Mon, 6 Jul 2020 12:05:27 GMT
- Title: Bilingual Dictionary Based Neural Machine Translation without Using
Parallel Sentences
- Authors: Xiangyu Duan, Baijun Ji, Hao Jia, Min Tan, Min Zhang, Boxing Chen,
Weihua Luo and Yue Zhang
- Abstract summary: We propose a new task of machine translation (MT) based on no parallel sentences but can refer to a ground-truth bilingual dictionary.
Motivated by the ability of a monolingual speaker learning to translate via looking up the bilingual dictionary, we propose the task to see how much potential an MT system can attain.
- Score: 45.99290614777277
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a new task of machine translation (MT), which is
based on no parallel sentences but can refer to a ground-truth bilingual
dictionary. Motivated by the ability of a monolingual speaker learning to
translate via looking up the bilingual dictionary, we propose the task to see
how much potential an MT system can attain using the bilingual dictionary and
large scale monolingual corpora, while is independent on parallel sentences. We
propose anchored training (AT) to tackle the task. AT uses the bilingual
dictionary to establish anchoring points for closing the gap between source
language and target language. Experiments on various language pairs show that
our approaches are significantly better than various baselines, including
dictionary-based word-by-word translation, dictionary-supervised cross-lingual
word embedding transformation, and unsupervised MT. On distant language pairs
that are hard for unsupervised MT to perform well, AT performs remarkably
better, achieving performances comparable to supervised SMT trained on more
than 4M parallel sentences.
Related papers
- Improving Multi-lingual Alignment Through Soft Contrastive Learning [9.454626745893798]
We propose a novel method to align multi-lingual embeddings based on the similarity of sentences measured by a pre-trained mono-lingual embedding model.
Given translation sentence pairs, we train a multi-lingual model in a way that the similarity between cross-lingual embeddings follows the similarity of sentences measured at the mono-lingual teacher model.
arXiv Detail & Related papers (2024-05-25T09:46:07Z) - Revisiting Machine Translation for Cross-lingual Classification [91.43729067874503]
Most research in the area focuses on the multilingual models rather than the Machine Translation component.
We show that, by using a stronger MT system and mitigating the mismatch between training on original text and running inference on machine translated text, translate-test can do substantially better than previously assumed.
arXiv Detail & Related papers (2023-05-23T16:56:10Z) - VECO 2.0: Cross-lingual Language Model Pre-training with
Multi-granularity Contrastive Learning [56.47303426167584]
We propose a cross-lingual pre-trained model VECO2.0 based on contrastive learning with multi-granularity alignments.
Specifically, the sequence-to-sequence alignment is induced to maximize the similarity of the parallel pairs and minimize the non-parallel pairs.
token-to-token alignment is integrated to bridge the gap between synonymous tokens excavated via the thesaurus dictionary from the other unpaired tokens in a bilingual instance.
arXiv Detail & Related papers (2023-04-17T12:23:41Z) - DICTDIS: Dictionary Constrained Disambiguation for Improved NMT [50.888881348723295]
We present DictDis, a lexically constrained NMT system that disambiguates between multiple candidate translations derived from dictionaries.
We demonstrate the utility of DictDis via extensive experiments on English-Hindi and English-German sentences in a variety of domains including regulatory, finance, engineering.
arXiv Detail & Related papers (2022-10-13T13:04:16Z) - Dict-NMT: Bilingual Dictionary based NMT for Extremely Low Resource
Languages [1.8787713898828164]
We present a detailed analysis of the effects of the quality of dictionaries, training dataset size, language family, etc., on the translation quality.
Results on multiple low-resource test languages show a clear advantage of our bilingual dictionary-based method over the baselines.
arXiv Detail & Related papers (2022-06-09T12:03:29Z) - Towards the Next 1000 Languages in Multilingual Machine Translation:
Exploring the Synergy Between Supervised and Self-Supervised Learning [48.15259834021655]
We present a pragmatic approach towards building a multilingual machine translation model that covers hundreds of languages.
We use a mixture of supervised and self-supervised objectives, depending on the data availability for different language pairs.
We demonstrate that the synergy between these two training paradigms enables the model to produce high-quality translations in the zero-resource setting.
arXiv Detail & Related papers (2022-01-09T23:36:44Z) - Assessing the Bilingual Knowledge Learned by Neural Machine Translation
Models [72.56058378313963]
We bridge the gap by assessing the bilingual knowledge learned by NMT models with phrase table.
We find that NMT models learn patterns from simple to complex and distill essential bilingual knowledge from the training examples.
arXiv Detail & Related papers (2020-04-28T03:44:34Z) - Bootstrapping a Crosslingual Semantic Parser [74.99223099702157]
We adapt a semantic trained on a single language, such as English, to new languages and multiple domains with minimal annotation.
We query if machine translation is an adequate substitute for training data, and extend this to investigate bootstrapping using joint training with English, paraphrasing, and multilingual pre-trained models.
arXiv Detail & Related papers (2020-04-06T12:05:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.