CUNI Systems for the Unsupervised and Very Low Resource Translation Task
in WMT20
- URL: http://arxiv.org/abs/2010.11747v1
- Date: Thu, 22 Oct 2020 14:04:01 GMT
- Title: CUNI Systems for the Unsupervised and Very Low Resource Translation Task
in WMT20
- Authors: Ivana Kvapil\'ikov\'a, Tom Kocmi, Ond\v{r}ej Bojar
- Abstract summary: This paper presents a description of CUNI systems submitted to the WMT20 task on unsupervised machine translation between German and Upper Sorbian.
In the fully unsupervised scenario, we achieved 25.5 and 23.7 BLEU translating from and into Upper Sorbian.
Our low-resource systems relied on transfer learning from German-Czech parallel data and achieved 57.4 BLEU and 56.1 BLEU.
- Score: 3.183845608678763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a description of CUNI systems submitted to the WMT20 task
on unsupervised and very low-resource supervised machine translation between
German and Upper Sorbian. We experimented with training on synthetic data and
pre-training on a related language pair. In the fully unsupervised scenario, we
achieved 25.5 and 23.7 BLEU translating from and into Upper Sorbian,
respectively. Our low-resource systems relied on transfer learning from
German-Czech parallel data and achieved 57.4 BLEU and 56.1 BLEU, which is an
improvement of 10 BLEU points over the baseline trained only on the available
small German-Upper Sorbian parallel corpus.
Related papers
- Boosting Unsupervised Machine Translation with Pseudo-Parallel Data [2.900810893770134]
We propose a training strategy that relies on pseudo-parallel sentence pairs mined from monolingual corpora and synthetic sentence pairs back-translated from monolingual corpora.
We reach an improvement of up to 14.5 BLEU points (English to Ukrainian) over a baseline trained on back-translated data only.
arXiv Detail & Related papers (2023-10-22T10:57:12Z) - NAVER LABS Europe's Multilingual Speech Translation Systems for the
IWSLT 2023 Low-Resource Track [78.80683163990446]
This paper presents NAVER LABS Europe's systems for Tamasheq-French and Quechua-Spanish speech translation in the IWSLT 2023 Low-Resource track.
Our work attempts to maximize translation quality in low-resource settings using multilingual parameter-efficient solutions.
arXiv Detail & Related papers (2023-06-13T13:22:30Z) - Tencent AI Lab - Shanghai Jiao Tong University Low-Resource Translation
System for the WMT22 Translation Task [49.916963624249355]
This paper describes Tencent AI Lab - Shanghai Jiao Tong University (TAL-SJTU) Low-Resource Translation systems for the WMT22 shared task.
We participate in the general translation task on English$Leftrightarrow$Livonian.
Our system is based on M2M100 with novel techniques that adapt it to the target language pair.
arXiv Detail & Related papers (2022-10-17T04:34:09Z) - Bilingual Dictionary-based Language Model Pretraining for Neural Machine
Translation [0.0]
We incorporate the translation information from dictionaries into the pretraining process and propose a novel Bilingual Dictionary-based Language Model (BDLM)
We evaluate our BDLM in Chinese, English, and Romanian.
arXiv Detail & Related papers (2021-03-12T02:01:22Z) - The LMU Munich System for the WMT 2020 Unsupervised Machine Translation
Shared Task [125.06737861979299]
This paper describes the submission of LMU Munich to the WMT 2020 unsupervised shared task, in two language directions.
Our core unsupervised neural machine translation (UNMT) system follows the strategy of Chronopoulou et al.
We ensemble our best-performing systems and reach a BLEU score of 32.4 on German->Upper Sorbian and 35.2 on Upper Sorbian->German.
arXiv Detail & Related papers (2020-10-25T19:04:03Z) - Unsupervised Bitext Mining and Translation via Self-trained Contextual
Embeddings [51.47607125262885]
We describe an unsupervised method to create pseudo-parallel corpora for machine translation (MT) from unaligned text.
We use multilingual BERT to create source and target sentence embeddings for nearest-neighbor search and adapt the model via self-training.
We validate our technique by extracting parallel sentence pairs on the BUCC 2017 bitext mining task and observe up to a 24.5 point increase (absolute) in F1 scores over previous unsupervised methods.
arXiv Detail & Related papers (2020-10-15T14:04:03Z) - SJTU-NICT's Supervised and Unsupervised Neural Machine Translation
Systems for the WMT20 News Translation Task [111.91077204077817]
We participated in four translation directions of three language pairs: English-Chinese, English-Polish, and German-Upper Sorbian.
Based on different conditions of language pairs, we have experimented with diverse neural machine translation (NMT) techniques.
In our submissions, the primary systems won the first place on English to Chinese, Polish to English, and German to Upper Sorbian translation directions.
arXiv Detail & Related papers (2020-10-11T00:40:05Z) - Cross-lingual Supervision Improves Unsupervised Neural Machine
Translation [97.84871088440102]
We introduce a multilingual unsupervised NMT framework to leverage weakly supervised signals from high-resource language pairs to zero-resource translation directions.
Method significantly improves the translation quality by more than 3 BLEU score on six benchmark unsupervised translation directions.
arXiv Detail & Related papers (2020-04-07T05:46:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.