VOLT: Improving Vocabularization via Optimal Transport for Machine
Translation
- URL: http://arxiv.org/abs/2012.15671v1
- Date: Thu, 31 Dec 2020 15:49:49 GMT
- Title: VOLT: Improving Vocabularization via Optimal Transport for Machine
Translation
- Authors: Jingjing Xu, Hao Zhou, Chun Gan, Zaixiang Zheng, Lei Li
- Abstract summary: We find an exciting relation between an information-theoretic feature and BLEU scores.
We propose VOLT, a simple and efficient vocabularization solution without the full and costly trial training.
VOLT achieves 70% vocabulary size reduction and 0.6 BLEU gain on English-German translation.
- Score: 22.07373011242121
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is well accepted that the choice of token vocabulary largely affects the
performance of machine translation. However, due to expensive trial costs, most
studies only conduct simple trials with dominant approaches (e.g BPE) and
commonly used vocabulary sizes. In this paper, we find an exciting relation
between an information-theoretic feature and BLEU scores. With this
observation, we formulate the quest of vocabularization -- finding the best
token dictionary with a proper size -- as an optimal transport problem. We then
propose VOLT, a simple and efficient vocabularization solution without the full
and costly trial training. We evaluate our approach on multiple machine
translation tasks, including WMT-14 English-German translation, TED bilingual
translation, and TED multilingual translation. Empirical results show that VOLT
beats widely-used vocabularies on diverse scenarios. For example, VOLT achieves
70% vocabulary size reduction and 0.6 BLEU gain on English-German translation.
Also, one advantage of VOLT lies in its low resource consumption. Compared to
naive BPE-search, VOLT reduces the search time from 288 GPU hours to 0.5 CPU
hours.
Related papers
- An Analysis of BPE Vocabulary Trimming in Neural Machine Translation [56.383793805299234]
vocabulary trimming is a postprocessing step that replaces rare subwords with their component subwords.
We show that vocabulary trimming fails to improve performance and is even prone to incurring heavy degradation.
arXiv Detail & Related papers (2024-03-30T15:29:49Z) - The Best of Both Worlds: Combining Human and Machine Translations for
Multilingual Semantic Parsing with Active Learning [50.320178219081484]
We propose an active learning approach that exploits the strengths of both human and machine translations.
An ideal utterance selection can significantly reduce the error and bias in the translated data.
arXiv Detail & Related papers (2023-05-22T05:57:47Z) - On the Off-Target Problem of Zero-Shot Multilingual Neural Machine
Translation [104.85258654917297]
We find that failing in encoding discriminative target language signal will lead to off-target and a closer lexical distance.
We propose Language Aware Vocabulary Sharing (LAVS) to construct the multilingual vocabulary.
We conduct experiments on a multilingual machine translation benchmark in 11 languages.
arXiv Detail & Related papers (2023-05-18T12:43:31Z) - Bilex Rx: Lexical Data Augmentation for Massively Multilingual Machine
Translation [33.6064740446337]
This work explores a cheap and abundant resource to combat this problem: bilingual lexica.
We test the efficacy of bilingual lexica in a real-world set-up, on 200-language translation models trained on web-crawled text.
We present several findings: (1) using lexical data augmentation, we demonstrate sizable performance gains for unsupervised translation; (2) we compare several families of data augmentation, demonstrating that they yield similar improvements; and (3) we demonstrate the importance of carefully curated lexica over larger, noisier ones.
arXiv Detail & Related papers (2023-03-27T14:54:43Z) - Fast Vocabulary Projection Method via Clustering for Multilingual
Machine Translation on GPU [6.1646755570223934]
This paper proposes a fast vocabulary projection method via clustering.
The proposed method speeds up the vocab projection step itself by up to 2.6x.
We also conduct an extensive human evaluation to verify the proposed method preserves the quality of the translations from the original model.
arXiv Detail & Related papers (2022-08-14T16:10:14Z) - How Effective is Byte Pair Encoding for Out-Of-Vocabulary Words in
Neural Machine Translation? [17.300004156754966]
We analyze the translation quality of OOV words based on word type, number of segments, cross-attention, and the frequency of segment n-grams.
Our experiments show that while careful BPE settings seem to be fairly useful in translating OOV words across weights, a considerable percentage of OOV words are translated incorrectly.
arXiv Detail & Related papers (2022-08-10T08:57:13Z) - Allocating Large Vocabulary Capacity for Cross-lingual Language Model
Pre-training [59.571632468137075]
We find that many languages are under-represented in recent cross-lingual language models due to the limited vocabulary capacity.
We propose an algorithm VoCap to determine the desired vocabulary capacity of each language.
In order to address the issues, we propose k-NN-based target sampling to accelerate the expensive softmax.
arXiv Detail & Related papers (2021-09-15T14:04:16Z) - ChrEnTranslate: Cherokee-English Machine Translation Demo with Quality
Estimation and Corrective Feedback [70.5469946314539]
ChrEnTranslate is an online machine translation demonstration system for translation between English and an endangered language Cherokee.
It supports both statistical and neural translation models as well as provides quality estimation to inform users of reliability.
arXiv Detail & Related papers (2021-07-30T17:58:54Z) - Bilingual Dictionary Based Neural Machine Translation without Using
Parallel Sentences [45.99290614777277]
We propose a new task of machine translation (MT) based on no parallel sentences but can refer to a ground-truth bilingual dictionary.
Motivated by the ability of a monolingual speaker learning to translate via looking up the bilingual dictionary, we propose the task to see how much potential an MT system can attain.
arXiv Detail & Related papers (2020-07-06T12:05:27Z) - Knowledge Distillation for Multilingual Unsupervised Neural Machine
Translation [61.88012735215636]
Unsupervised neural machine translation (UNMT) has recently achieved remarkable results for several language pairs.
UNMT can only translate between a single language pair and cannot produce translation results for multiple language pairs at the same time.
In this paper, we empirically introduce a simple method to translate between thirteen languages using a single encoder and a single decoder.
arXiv Detail & Related papers (2020-04-21T17:26:16Z) - Neural Machine Translation for Low-Resourced Indian Languages [4.726777092009554]
Machine translation is an effective approach to convert text to a different language without any human involvement.
In this paper, we have applied NMT on two of the most morphological rich Indian languages, i.e. English-Tamil and English-Malayalam.
We proposed a novel NMT model using Multihead self-attention along with pre-trained Byte-Pair-Encoded (BPE) and MultiBPE embeddings to develop an efficient translation system.
arXiv Detail & Related papers (2020-04-19T17:29:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.