Using Machine Translation to Localize Task Oriented NLG Output
- URL: http://arxiv.org/abs/2107.04512v1
- Date: Fri, 9 Jul 2021 15:56:45 GMT
- Title: Using Machine Translation to Localize Task Oriented NLG Output
- Authors: Scott Roy, Cliff Brunk, Kyu-Young Kim, Justin Zhao, Markus Freitag,
Mihir Kale, Gagan Bansal, Sidharth Mudgal, Chris Varano
- Abstract summary: This paper explores doing this by applying machine translation to the English output.
The required quality bar is close to perfection, the range of sentences is extremely narrow, and the sentences are often very different from the ones in the machine translation training data.
We are able to reach the required quality bar by building on existing ideas and adding new ones.
- Score: 5.770385426429663
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One of the challenges in a task oriented natural language application like
the Google Assistant, Siri, or Alexa is to localize the output to many
languages. This paper explores doing this by applying machine translation to
the English output. Using machine translation is very scalable, as it can work
with any English output and can handle dynamic text, but otherwise the problem
is a poor fit. The required quality bar is close to perfection, the range of
sentences is extremely narrow, and the sentences are often very different than
the ones in the machine translation training data. This combination of
requirements is novel in the field of domain adaptation for machine
translation. We are able to reach the required quality bar by building on
existing ideas and adding new ones: finetuning on in-domain translations,
adding sentences from the Web, adding semantic annotations, and using automatic
error detection. The paper shares our approach and results, together with a
distillation model to serve the translation models at scale.
Related papers
- Contextual Refinement of Translations: Large Language Models for Sentence and Document-Level Post-Editing [12.843274390224853]
Large Language Models (LLM's) have demonstrated considerable success in various Natural Language Processing tasks.
We show that they have yet to attain state-of-the-art performance in Neural Machine Translation.
We propose adapting LLM's as Automatic Post-Editors (APE) rather than direct translators.
arXiv Detail & Related papers (2023-10-23T12:22:15Z) - Do Multilingual Language Models Think Better in English? [24.713751471567395]
Translate-test is a popular technique to improve the performance of multilingual language models.
In this work, we introduce a new approach called self-translate, which overcomes the need of an external translation system.
arXiv Detail & Related papers (2023-08-02T15:29:22Z) - TIM: Teaching Large Language Models to Translate with Comparison [78.66926087162672]
We propose a novel framework using examples in comparison to teach LLMs to learn translation.
Our approach involves presenting the model with examples of correct and incorrect translations and using a preference loss to guide the model's learning.
Our findings offer a new perspective on fine-tuning LLMs for translation tasks and provide a promising solution for generating high-quality translations.
arXiv Detail & Related papers (2023-07-10T08:15:40Z) - On the Copying Problem of Unsupervised NMT: A Training Schedule with a
Language Discriminator Loss [120.19360680963152]
unsupervised neural machine translation (UNMT) has achieved success in many language pairs.
The copying problem, i.e., directly copying some parts of the input sentence as the translation, is common among distant language pairs.
We propose a simple but effective training schedule that incorporates a language discriminator loss.
arXiv Detail & Related papers (2023-05-26T18:14:23Z) - Decomposed Prompting for Machine Translation Between Related Languages
using Large Language Models [55.35106713257871]
We introduce DecoMT, a novel approach of few-shot prompting that decomposes the translation process into a sequence of word chunk translations.
We show that DecoMT outperforms the strong few-shot prompting BLOOM model with an average improvement of 8 chrF++ scores across the examined languages.
arXiv Detail & Related papers (2023-05-22T14:52:47Z) - The Best of Both Worlds: Combining Human and Machine Translations for
Multilingual Semantic Parsing with Active Learning [50.320178219081484]
We propose an active learning approach that exploits the strengths of both human and machine translations.
An ideal utterance selection can significantly reduce the error and bias in the translated data.
arXiv Detail & Related papers (2023-05-22T05:57:47Z) - Translate your gibberish: black-box adversarial attack on machine
translation systems [0.0]
We present a simple approach to fool state-of-the-art machine translation tools in the task of translation from Russian to English and vice versa.
We show that many online translation tools, such as Google, DeepL, and Yandex, may both produce wrong or offensive translations for nonsensical adversarial input queries.
This vulnerability may interfere with understanding a new language and simply worsen the user's experience while using machine translation systems.
arXiv Detail & Related papers (2023-03-20T09:52:52Z) - BitextEdit: Automatic Bitext Editing for Improved Low-Resource Machine
Translation [53.55009917938002]
We propose to refine the mined bitexts via automatic editing.
Experiments demonstrate that our approach successfully improves the quality of CCMatrix mined bitext for 5 low-resource language-pairs and 10 translation directions by up to 8 BLEU points.
arXiv Detail & Related papers (2021-11-12T16:00:39Z) - Improving Sentiment Analysis over non-English Tweets using Multilingual
Transformers and Automatic Translation for Data-Augmentation [77.69102711230248]
We propose the use of a multilingual transformer model, that we pre-train over English tweets and apply data-augmentation using automatic translation to adapt the model to non-English languages.
Our experiments in French, Spanish, German and Italian suggest that the proposed technique is an efficient way to improve the results of the transformers over small corpora of tweets in a non-English language.
arXiv Detail & Related papers (2020-10-07T15:44:55Z) - Computer Assisted Translation with Neural Quality Estimation and
Automatic Post-Editing [18.192546537421673]
We propose an end-to-end deep learning framework of the quality estimation and automatic post-editing of the machine translation output.
Our goal is to provide error correction suggestions and to further relieve the burden of human translators through an interpretable model.
arXiv Detail & Related papers (2020-09-19T00:29:00Z) - Learning to Detect Unacceptable Machine Translations for Downstream
Tasks [33.07594909221625]
We put machine translation in a cross-lingual pipeline and introduce downstream tasks to define task-specific acceptability of machine translations.
This allows us to leverage parallel data to automatically generate acceptability annotations on a large scale.
We conduct experiments to demonstrate the effectiveness of our framework for a range of downstream tasks and translation models.
arXiv Detail & Related papers (2020-05-08T09:37:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.