Related papers: Fast Vocabulary Projection Method via Clustering for Multilingual Machine Translation on GPU

Fast Vocabulary Projection Method via Clustering for Multilingual Machine Translation on GPU

URL: http://arxiv.org/abs/2208.06874v1
Date: Sun, 14 Aug 2022 16:10:14 GMT
Title: Fast Vocabulary Projection Method via Clustering for Multilingual Machine Translation on GPU
Authors: Hossam Amer, Young Jin Kim, Mohamed Afify, Hitokazu Matsushita, Hany Hassan Awadallah
Abstract summary: This paper proposes a fast vocabulary projection method via clustering. The proposed method speeds up the vocab projection step itself by up to 2.6x. We also conduct an extensive human evaluation to verify the proposed method preserves the quality of the translations from the original model.
Score: 6.1646755570223934
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multilingual Neural Machine Translation has been showing great success using transformer models. Deploying these models is challenging because they usually require large vocabulary (vocab) sizes for various languages. This limits the speed of predicting the output tokens in the last vocab projection layer. To alleviate these challenges, this paper proposes a fast vocabulary projection method via clustering which can be used for multilingual transformers on GPUs. First, we offline split the vocab search space into disjoint clusters given the hidden context vector of the decoder output, which results in much smaller vocab columns for vocab projection. Second, at inference time, the proposed method predicts the clusters and candidate active tokens for hidden context vectors at the vocab projection. This paper also includes analysis of different ways of building these clusters in multilingual settings. Our results show end-to-end speed gains in float16 GPU inference up to 25% while maintaining the BLEU score and slightly increasing memory cost. The proposed method speeds up the vocab projection step itself by up to 2.6x. We also conduct an extensive human evaluation to verify the proposed method preserves the quality of the translations from the original model.

Related papers

An Analysis of BPE Vocabulary Trimming in Neural Machine Translation [56.383793805299234]
vocabulary trimming is a postprocessing step that replaces rare subwords with their component subwords. We show that vocabulary trimming fails to improve performance and is even prone to incurring heavy degradation.
arXiv Detail & Related papers (2024-03-30T15:29:49Z)
Exploring Linguistic Similarity and Zero-Shot Learning for Multilingual Translation of Dravidian Languages [0.34998703934432673]
We build a single-decoder neural machine translation system for Dravidian-Dravidian multilingual translation. Our model achieves scores within 3 BLEU of large-scale pivot-based models when it is trained on 50% of the language directions.
arXiv Detail & Related papers (2023-08-10T13:38:09Z)
Beyond Contrastive Learning: A Variational Generative Model for Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings. Our model operates on parallel data in $N$ languages. We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z)
Word-Level Representation From Bytes For Language Modeling [46.28198397863388]
Sub-word tokenization is not robust to noise and difficult to generalize to new languages. We introduce a cross-attention network that builds word-level representation directly from bytes, and a sub-word level prediction based on word-level hidden states. Byte2Word is on par with the strong sub-word baseline BERT but only takes up 10% of embedding size.
arXiv Detail & Related papers (2022-11-23T03:11:13Z)
Better Language Model with Hypernym Class Prediction [101.8517004687825]
Class-based language models (LMs) have been long devised to address context sparsity in $n$-gram LMs. In this study, we revisit this approach in the context of neural LMs.
arXiv Detail & Related papers (2022-03-21T01:16:44Z)
Allocating Large Vocabulary Capacity for Cross-lingual Language Model Pre-training [59.571632468137075]
We find that many languages are under-represented in recent cross-lingual language models due to the limited vocabulary capacity. We propose an algorithm VoCap to determine the desired vocabulary capacity of each language. In order to address the issues, we propose k-NN-based target sampling to accelerate the expensive softmax.
arXiv Detail & Related papers (2021-09-15T14:04:16Z)
Lightweight Cross-Lingual Sentence Representation Learning [57.9365829513914]
We introduce a lightweight dual-transformer architecture with just 2 layers for generating memory-efficient cross-lingual sentence representations. We propose a novel cross-lingual language model, which combines the existing single-word masked language model with the newly proposed cross-lingual token-level reconstruction task.
arXiv Detail & Related papers (2021-05-28T14:10:48Z)
VOLT: Improving Vocabularization via Optimal Transport for Machine Translation [22.07373011242121]
We find an exciting relation between an information-theoretic feature and BLEU scores. We propose VOLT, a simple and efficient vocabularization solution without the full and costly trial training. VOLT achieves 70% vocabulary size reduction and 0.6 BLEU gain on English-German translation.
arXiv Detail & Related papers (2020-12-31T15:49:49Z)
Investigating Language Impact in Bilingual Approaches for Computational Language Documentation [28.838960956506018]
This paper investigates how the choice of translation language affects the posterior documentation work. We create 56 bilingual pairs that we apply to the task of low-resource unsupervised word segmentation and alignment. Our results suggest that incorporating clues into the neural models' input representation increases their translation and alignment quality.
arXiv Detail & Related papers (2020-03-30T10:30:34Z)
Robust Cross-lingual Embeddings from Parallel Sentences [65.85468628136927]
We propose a bilingual extension of the CBOW method which leverages sentence-aligned corpora to obtain robust cross-lingual word representations. Our approach significantly improves crosslingual sentence retrieval performance over all other approaches. It also achieves parity with a deep RNN method on a zero-shot cross-lingual document classification task.
arXiv Detail & Related papers (2019-12-28T16:18:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.