Onception: Active Learning with Expert Advice for Real World Machine
Translation
- URL: http://arxiv.org/abs/2203.04507v1
- Date: Wed, 9 Mar 2022 03:32:42 GMT
- Title: Onception: Active Learning with Expert Advice for Real World Machine
Translation
- Authors: V\^ania Mendon\c{c}a (1 and 2), Ricardo Rei (1 and 2 and 3), Luisa
Coheur (1 and 2), Alberto Sardinha (1 and 2) ((1) INESC-ID Lisboa, (2)
Instituto Superior T\'ecnico, (3) Unbabel AI)
- Abstract summary: Most active learning approaches for Machine Translation assume the existence of a pool of sentences in a source language, and rely on human annotators to provide translations or post-edits.
In this paper, we assume a real world human-in-the-loop scenario in which: (i) the source sentences may not be readily available, but instead arrive in a stream; (ii) the automatic translations receive feedback in the form of a rating, instead of a correct/edited translation, since the human-in-the-loop might be a user looking for a translation, but not be able to provide one
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Active learning can play an important role in low-resource settings (i.e.,
where annotated data is scarce), by selecting which instances may be more
worthy to annotate. Most active learning approaches for Machine Translation
assume the existence of a pool of sentences in a source language, and rely on
human annotators to provide translations or post-edits, which can still be
costly. In this paper, we assume a real world human-in-the-loop scenario in
which: (i) the source sentences may not be readily available, but instead
arrive in a stream; (ii) the automatic translations receive feedback in the
form of a rating, instead of a correct/edited translation, since the
human-in-the-loop might be a user looking for a translation, but not be able to
provide one. To tackle the challenge of deciding whether each incoming pair
source-translations is worthy to query for human feedback, we resort to a
number of stream-based active learning query strategies. Moreover, since we not
know in advance which query strategy will be the most adequate for a certain
language pair and set of Machine Translation models, we propose to dynamically
combine multiple strategies using prediction with expert advice. Our
experiments show that using active learning allows to converge to the best
Machine Translation systems with fewer human interactions. Furthermore,
combining multiple strategies using prediction with expert advice often
outperforms several individual active learning strategies with even fewer
interactions.
Related papers
- Shortcomings of LLMs for Low-Resource Translation: Retrieval and Understanding are Both the Problem [4.830018386227]
This work investigates the in-context learning abilities of pretrained large language models (LLMs) when instructed to translate text from a low-resource language into a high-resource language as part of an automated machine translation pipeline.
We conduct a set of experiments translating Southern Quechua to Spanish and examine the informativity of various types of context retrieved from a constrained database of digitized pedagogical materials and parallel corpora.
arXiv Detail & Related papers (2024-06-21T20:02:22Z) - Investigating Multi-Pivot Ensembling with Massively Multilingual Machine Translation Models [47.91306228406407]
We revisit ways of pivoting through multiple languages.
We propose MaxEns, a novel combination strategy that makes the output biased towards the most confident predictions.
On average, multi-pivot strategies still lag behind using English as a single pivot language.
arXiv Detail & Related papers (2023-11-13T16:15:20Z) - The Best of Both Worlds: Combining Human and Machine Translations for
Multilingual Semantic Parsing with Active Learning [50.320178219081484]
We propose an active learning approach that exploits the strengths of both human and machine translations.
An ideal utterance selection can significantly reduce the error and bias in the translated data.
arXiv Detail & Related papers (2023-05-22T05:57:47Z) - Beyond Contrastive Learning: A Variational Generative Model for
Multilingual Retrieval [109.62363167257664]
We propose a generative model for learning multilingual text embeddings.
Our model operates on parallel data in $N$ languages.
We evaluate this method on a suite of tasks including semantic similarity, bitext mining, and cross-lingual question retrieval.
arXiv Detail & Related papers (2022-12-21T02:41:40Z) - Towards Best Practices for Training Multilingual Dense Retrieval Models [54.91016739123398]
We focus on the task of monolingual retrieval in a variety of typologically diverse languages using one such design.
Our study is organized as a "best practices" guide for training multilingual dense retrieval models.
arXiv Detail & Related papers (2022-04-05T17:12:53Z) - IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and
Languages [87.5457337866383]
We introduce the Image-Grounded Language Understanding Evaluation benchmark.
IGLUE brings together visual question answering, cross-modal retrieval, grounded reasoning, and grounded entailment tasks across 20 diverse languages.
We find that translate-test transfer is superior to zero-shot transfer and that few-shot learning is hard to harness for many tasks.
arXiv Detail & Related papers (2022-01-27T18:53:22Z) - Self-Supervised Representations Improve End-to-End Speech Translation [57.641761472372814]
We show that self-supervised pre-trained features can consistently improve the translation performance.
Cross-lingual transfer allows to extend to a variety of languages without or with little tuning.
arXiv Detail & Related papers (2020-06-22T10:28:38Z) - Learning Coupled Policies for Simultaneous Machine Translation using
Imitation Learning [85.70547744787]
We present an approach to efficiently learn a simultaneous translation model with coupled programmer-interpreter policies.
Experiments on six language-pairs show our method outperforms strong baselines in terms of translation quality.
arXiv Detail & Related papers (2020-02-11T10:56:42Z) - Exploring Benefits of Transfer Learning in Neural Machine Translation [3.7612918175471393]
We propose several transfer learning approaches to reuse a model pretrained on a high-resource language pair.
We show how our techniques address specific problems of low-resource languages and are suitable even in high-resource transfer learning.
arXiv Detail & Related papers (2020-01-06T15:11:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.