Related papers: Domain Adaptation of NMT models for English-Hindi Machine Translation Task at AdapMT ICON 2020

Domain Adaptation of NMT models for English-Hindi Machine Translation Task at AdapMT ICON 2020

URL: http://arxiv.org/abs/2012.12112v2
Date: Wed, 23 Dec 2020 11:59:51 GMT
Title: Domain Adaptation of NMT models for English-Hindi Machine Translation Task at AdapMT ICON 2020
Authors: Ramchandra Joshi, Rushabh Karnavat, Kaustubh Jirapure, Raviraj Joshi
Abstract summary: This paper describes the neural machine translation systems for the English-Hindi language presented in AdapMT Shared Task ICON 2020. Our team was ranked first in the chemistry and general domain En-Hi translation task and second in the AI domain En-Hi translation task.
Score: 2.572404739180802
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advancements in Neural Machine Translation (NMT) models have proved to produce a state of the art results on machine translation for low resource Indian languages. This paper describes the neural machine translation systems for the English-Hindi language presented in AdapMT Shared Task ICON 2020. The shared task aims to build a translation system for Indian languages in specific domains like Artificial Intelligence (AI) and Chemistry using a small in-domain parallel corpus. We evaluated the effectiveness of two popular NMT models i.e, LSTM, and Transformer architectures for the English-Hindi machine translation task based on BLEU scores. We train these models primarily using the out of domain data and employ simple domain adaptation techniques based on the characteristics of the in-domain dataset. The fine-tuning and mixed-domain data approaches are used for domain adaptation. Our team was ranked first in the chemistry and general domain En-Hi translation task and second in the AI domain En-Hi translation task.

Related papers

Shiksha: A Technical Domain focused Translation Dataset and Model for Indian Languages [11.540702510360985]
We create a parallel corpus containing more than 2.8 million rows of English-to-Indic and Indic-to-Indic high-quality translation pairs across 8 Indian languages. We finetune and evaluate NMT models using this corpus and surpass all other publicly available models at in-domain tasks.
arXiv Detail & Related papers (2024-12-12T07:40:55Z)
Domain Adaptation for Arabic Machine Translation: The Case of Financial Texts [0.7673339435080445]
We develop a parallel corpus for Arabic-English (AR- EN) translation in the financial domain. We fine-tune several NMT and Large Language models including ChatGPT-3.5 Turbo. The quality of ChatGPT translation was superior than other models based on automatic and human evaluations.
arXiv Detail & Related papers (2023-09-22T13:37:19Z)
$m^4Adapter$: Multilingual Multi-Domain Adaptation for Machine Translation with a Meta-Adapter [128.69723410769586]
Multilingual neural machine translation models (MNMT) yield state-of-the-art performance when evaluated on data from a domain and language pair. When a MNMT model is used to translate under domain shift or to a new language pair, performance drops dramatically. We propose $m4Adapter$, which combines domain and language knowledge using meta-learning with adapters.
arXiv Detail & Related papers (2022-10-21T12:25:05Z)
Domain-Specific Text Generation for Machine Translation [7.803471587734353]
We propose a novel approach to domain adaptation leveraging state-of-the-art pretrained language models (LMs) for domain-specific data augmentation. We employ mixed fine-tuning to train models that significantly improve translation of in-domain texts.
arXiv Detail & Related papers (2022-08-11T16:22:16Z)
Learning Domain Specific Language Models for Automatic Speech Recognition through Machine Translation [0.0]
We use Neural Machine Translation as an intermediate step to first obtain translations of task-specific text data. We develop a procedure to derive word confusion networks from NMT beam search graphs. We demonstrate that NMT confusion networks can help to reduce the perplexity of both n-gram and recurrent neural network LMs.
arXiv Detail & Related papers (2021-09-21T10:29:20Z)
Non-Parametric Unsupervised Domain Adaptation for Neural Machine Translation [61.27321597981737]
$k$NN-MT has shown the promising capability of directly incorporating the pre-trained neural machine translation (NMT) model with domain-specific token-level $k$-nearest-neighbor retrieval. We propose a novel framework that directly uses in-domain monolingual sentences in the target language to construct an effective datastore for $k$-nearest-neighbor retrieval.
arXiv Detail & Related papers (2021-09-14T11:50:01Z)
FDMT: A Benchmark Dataset for Fine-grained Domain Adaptation in Machine Translation [53.87731008029645]
We present a real-world fine-grained domain adaptation task in machine translation (FDMT) The FDMT dataset consists of four sub-domains of information technology: autonomous vehicles, AI education, real-time networks and smart phone. We make quantitative experiments and deep analyses in this new setting, which benchmarks the fine-grained domain adaptation task.
arXiv Detail & Related papers (2020-12-31T17:15:09Z)
SJTU-NICT's Supervised and Unsupervised Neural Machine Translation Systems for the WMT20 News Translation Task [111.91077204077817]
We participated in four translation directions of three language pairs: English-Chinese, English-Polish, and German-Upper Sorbian. Based on different conditions of language pairs, we have experimented with diverse neural machine translation (NMT) techniques. In our submissions, the primary systems won the first place on English to Chinese, Polish to English, and German to Upper Sorbian translation directions.
arXiv Detail & Related papers (2020-10-11T00:40:05Z)
Iterative Domain-Repaired Back-Translation [50.32925322697343]
In this paper, we focus on the domain-specific translation with low resources, where in-domain parallel corpora are scarce or nonexistent. We propose a novel iterative domain-repaired back-translation framework, which introduces the Domain-Repair model to refine translations in synthetic bilingual data. Experiments on adapting NMT models between specific domains and from the general domain to specific domains demonstrate the effectiveness of our proposed approach.
arXiv Detail & Related papers (2020-10-06T04:38:09Z)
A Simple Baseline to Semi-Supervised Domain Adaptation for Machine Translation [73.3550140511458]
State-of-the-art neural machine translation (NMT) systems are data-hungry and perform poorly on new domains with no supervised data. We propose a simple but effect approach to the semi-supervised domain adaptation scenario of NMT. This approach iteratively trains a Transformer-based NMT model via three training objectives: language modeling, back-translation, and supervised translation.
arXiv Detail & Related papers (2020-01-22T16:42:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.