Multitask Finetuning for Improving Neural Machine Translation in Indian
Languages
- URL: http://arxiv.org/abs/2112.01742v1
- Date: Fri, 3 Dec 2021 06:43:56 GMT
- Title: Multitask Finetuning for Improving Neural Machine Translation in Indian
Languages
- Authors: Shaily Desai, Atharva Kshirsagar, Manisha Marathe
- Abstract summary: Transformer based language models have led to impressive results across all domains in Natural Language Processing.
We propose a Multitask Finetuning methodology which combines the Bilingual Machine Translation task with an auxiliary Causal Language Modeling task to improve performance.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Transformer based language models have led to impressive results across all
domains in Natural Language Processing. Pretraining these models on language
modeling tasks and finetuning them on downstream tasks such as Text
Classification, Question Answering and Neural Machine Translation has
consistently shown exemplary results. In this work, we propose a Multitask
Finetuning methodology which combines the Bilingual Machine Translation task
with an auxiliary Causal Language Modeling task to improve performance on the
former task on Indian Languages. We conduct an empirical study on three
language pairs, Marathi-Hindi, Marathi-English and Hindi-English, where we
compare the multitask finetuning approach to the standard finetuning approach,
for which we use the mBART50 model. Our study indicates that the multitask
finetuning method could be a better technique than standard finetuning, and
could improve Bilingual Machine Translation across language pairs.
Related papers
- Efficient Multilingual Dialogue Processing via Translation Pipelines and Distilled Language Models [0.0]
This paper presents team Kl33n3x's multilingual dialogue summarization and question answering system developed for the NLPAI4Health 2025 shared task.<n>The approach employs a three-stage pipeline: forward translation from Indic languages to English, multitask text generation using a 2.55B parameter distilled language model, and reverse translation back to source languages.<n>The system achieved strong win rates across the competition's tasks, with particularly robust performance on Marathi (86.7% QnA), Tamil (86.7% QnA), and Hindi (80.0% QnA)
arXiv Detail & Related papers (2026-01-14T01:02:06Z) - Key ingredients for effective zero-shot cross-lingual knowledge transfer in generative tasks [22.93790760274486]
Zero-shot cross-lingual knowledge transfer enables a multilingual pretrained language model, finetuned on a task in one language, make predictions for this task in other languages.
Previous works notice a frequent problem of generation in a wrong language and propose approaches to address it, usually using mT5 as a backbone model.
In this work we compare various approaches proposed from the literature in unified settings, also including alternative backbone models, namely mBART and NLLB-200.
arXiv Detail & Related papers (2024-02-19T16:43:57Z) - Revisiting Machine Translation for Cross-lingual Classification [91.43729067874503]
Most research in the area focuses on the multilingual models rather than the Machine Translation component.
We show that, by using a stronger MT system and mitigating the mismatch between training on original text and running inference on machine translated text, translate-test can do substantially better than previously assumed.
arXiv Detail & Related papers (2023-05-23T16:56:10Z) - Crosslingual Generalization through Multitask Finetuning [80.8822603322471]
Multitask prompted finetuning (MTF) has been shown to help large language models generalize to new tasks in a zero-shot setting.
We apply MTF to the pretrained multilingual BLOOM and mT5 model families to produce finetuned variants called BLOOMZ and mT0.
We find finetuning large multilingual language models on English tasks with English prompts allows for task generalization to non-English languages.
arXiv Detail & Related papers (2022-11-03T13:19:32Z) - Are Multilingual Models Effective in Code-Switching? [57.78477547424949]
We study the effectiveness of multilingual language models to understand their capability and adaptability to the mixed-language setting.
Our findings suggest that pre-trained multilingual models do not necessarily guarantee high-quality representations on code-switching.
arXiv Detail & Related papers (2021-03-24T16:20:02Z) - Exploring Pair-Wise NMT for Indian Languages [35.17470908190963]
We show that the performance of these models can be significantly improved by using back-translation through a filtered back-translation process.
The analysis in this paper suggests that this method can significantly improve a multilingual model's performance over its baseline.
arXiv Detail & Related papers (2020-12-10T16:22:36Z) - Indic-Transformers: An Analysis of Transformer Language Models for
Indian Languages [0.8155575318208631]
Language models based on the Transformer architecture have achieved state-of-the-art performance on a wide range of NLP tasks.
However, this performance is usually tested and reported on high-resource languages, like English, French, Spanish, and German.
Indian languages, on the other hand, are underrepresented in such benchmarks.
arXiv Detail & Related papers (2020-11-04T14:43:43Z) - Cross-lingual Machine Reading Comprehension with Language Branch
Knowledge Distillation [105.41167108465085]
Cross-lingual Machine Reading (CLMRC) remains a challenging problem due to the lack of large-scale datasets in low-source languages.
We propose a novel augmentation approach named Language Branch Machine Reading (LBMRC)
LBMRC trains multiple machine reading comprehension (MRC) models proficient in individual language.
We devise a multilingual distillation approach to amalgamate knowledge from multiple language branch models to a single model for all target languages.
arXiv Detail & Related papers (2020-10-27T13:12:17Z) - Multilingual Translation with Extensible Multilingual Pretraining and
Finetuning [77.33262578776291]
Previous work has demonstrated that machine translation systems can be created by finetuning on bitext.
We show that multilingual translation models can be created through multilingual finetuning.
We demonstrate that pretrained models can be extended to incorporate additional languages without loss of performance.
arXiv Detail & Related papers (2020-08-02T05:36:55Z) - Mono vs Multilingual Transformer-based Models: a Comparison across
Several Language Tasks [1.2691047660244335]
BERT (Bidirectional Representations from Transformers) and ALBERT (A Lite BERT) are methods for pre-training language models.
We make available our trained BERT and Albert model for Portuguese.
arXiv Detail & Related papers (2020-07-19T19:13:20Z) - Exploring Versatile Generative Language Model Via Parameter-Efficient
Transfer Learning [70.81910984985683]
We propose an effective way to fine-tune multiple down-stream generation tasks simultaneously using a single, large pre-trained model.
The experiments on five diverse language generation tasks show that by just using an additional 2-3% parameters for each task, our model can maintain or even improve the performance of fine-tuning the whole model.
arXiv Detail & Related papers (2020-04-08T06:18:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.