Related papers: The Best of Both Worlds: Combining Human and Machine Translations for Multilingual Semantic Parsing with Active Learning

The Best of Both Worlds: Combining Human and Machine Translations for Multilingual Semantic Parsing with Active Learning

URL: http://arxiv.org/abs/2305.12737v1
Date: Mon, 22 May 2023 05:57:47 GMT
Title: The Best of Both Worlds: Combining Human and Machine Translations for Multilingual Semantic Parsing with Active Learning
Authors: Zhuang Li, Lizhen Qu, Philip R. Cohen, Raj V. Tumuluri, Gholamreza Haffari
Abstract summary: We propose an active learning approach that exploits the strengths of both human and machine translations. An ideal utterance selection can significantly reduce the error and bias in the translated data.
Score: 50.320178219081484
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Multilingual semantic parsing aims to leverage the knowledge from the high-resource languages to improve low-resource semantic parsing, yet commonly suffers from the data imbalance problem. Prior works propose to utilize the translations by either humans or machines to alleviate such issues. However, human translations are expensive, while machine translations are cheap but prone to error and bias. In this work, we propose an active learning approach that exploits the strengths of both human and machine translations by iteratively adding small batches of human translations into the machine-translated training set. Besides, we propose novel aggregated acquisition criteria that help our active learning method select utterances to be manually translated. Our experiments demonstrate that an ideal utterance selection can significantly reduce the error and bias in the translated data, resulting in higher parser accuracies than the parsers merely trained on the machine-translated data.

Related papers

Multi-perspective Alignment for Increasing Naturalness in Neural Machine Translation [11.875491080062233]
Neural machine translation (NMT) systems amplify lexical biases present in their training data, leading to artificially impoverished language in output translations. We introduce a novel method that rewards both naturalness and content preservation. We evaluate our method on English-to-Dutch literary translation, and find that our best model produces translations that are lexically richer and exhibit more properties of human-written language, without loss in translation accuracy.
arXiv Detail & Related papers (2024-12-11T15:42:22Z)
A Data Selection Approach for Enhancing Low Resource Machine Translation Using Cross-Lingual Sentence Representations [0.4499833362998489]
This study focuses on the case of English-Marathi language pairs, where existing datasets are notably noisy. To mitigate the impact of data quality issues, we propose a data filtering approach based on cross-lingual sentence representations. Results demonstrate a significant improvement in translation quality over the baseline post-filtering with IndicSBERT.
arXiv Detail & Related papers (2024-09-04T13:49:45Z)
Advancing Translation Preference Modeling with RLHF: A Step Towards Cost-Effective Solution [57.42593422091653]
We explore leveraging reinforcement learning with human feedback to improve translation quality. A reward model with strong language capabilities can more sensitively learn the subtle differences in translation quality.
arXiv Detail & Related papers (2024-02-18T09:51:49Z)
Crossing the Threshold: Idiomatic Machine Translation through Retrieval Augmentation and Loss Weighting [66.02718577386426]
We provide a simple characterization of idiomatic translation and related issues. We conduct a synthetic experiment revealing a tipping point at which transformer-based machine translation models correctly default to idiomatic translations. To improve translation of natural idioms, we introduce two straightforward yet effective techniques.
arXiv Detail & Related papers (2023-10-10T23:47:25Z)
Easy Guided Decoding in Providing Suggestions for Interactive Machine Translation [14.615314828955288]
We propose a novel constrained decoding algorithm, namely Prefix Suffix Guided Decoding (PSGD) PSGD improves translation quality by an average of $10.87$ BLEU and $8.62$ BLEU on the WeTS and the WMT 2022 Translation Suggestion datasets.
arXiv Detail & Related papers (2022-11-14T03:40:02Z)
Towards Debiasing Translation Artifacts [15.991970288297443]
We propose a novel approach to reducing translationese by extending an established bias-removal technique. We use the Iterative Null-space Projection (INLP) algorithm, and show by measuring classification accuracy before and after debiasing, that translationese is reduced at both sentence and word level. To the best of our knowledge, this is the first study to debias translationese as represented in latent embedding space.
arXiv Detail & Related papers (2022-05-16T21:46:51Z)
DEEP: DEnoising Entity Pre-training for Neural Machine Translation [123.6686940355937]
It has been shown that machine translation models usually generate poor translations for named entities that are infrequent in the training corpus. We propose DEEP, a DEnoising Entity Pre-training method that leverages large amounts of monolingual data and a knowledge base to improve named entity translation accuracy within sentences.
arXiv Detail & Related papers (2021-11-14T17:28:09Z)
Translation Artifacts in Cross-lingual Transfer Learning [51.66536640084888]
We show that machine translation can introduce subtle artifacts that have a notable impact in existing cross-lingual models. In natural language inference, translating the premise and the hypothesis independently can reduce the lexical overlap between them. We also improve the state-of-the-art in XNLI for the translate-test and zero-shot approaches by 4.3 and 2.8 points, respectively.
arXiv Detail & Related papers (2020-04-09T17:54:30Z)
Bootstrapping a Crosslingual Semantic Parser [74.99223099702157]
We adapt a semantic trained on a single language, such as English, to new languages and multiple domains with minimal annotation. We query if machine translation is an adequate substitute for training data, and extend this to investigate bootstrapping using joint training with English, paraphrasing, and multilingual pre-trained models.
arXiv Detail & Related papers (2020-04-06T12:05:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.