Related papers: Towards Debiasing Translation Artifacts

Towards Debiasing Translation Artifacts

URL: http://arxiv.org/abs/2205.08001v1
Date: Mon, 16 May 2022 21:46:51 GMT
Title: Towards Debiasing Translation Artifacts
Authors: Koel Dutta Chowdhury, Rricha Jalota, Cristina Espa\~na-Bonet, and Josef van Genabith
Abstract summary: We propose a novel approach to reducing translationese by extending an established bias-removal technique. We use the Iterative Null-space Projection (INLP) algorithm, and show by measuring classification accuracy before and after debiasing, that translationese is reduced at both sentence and word level. To the best of our knowledge, this is the first study to debias translationese as represented in latent embedding space.
Score: 15.991970288297443
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Cross-lingual natural language processing relies on translation, either by humans or machines, at different levels, from translating training data to translating test sets. However, compared to original texts in the same language, translations possess distinct qualities referred to as translationese. Previous research has shown that these translation artifacts influence the performance of a variety of cross-lingual tasks. In this work, we propose a novel approach to reducing translationese by extending an established bias-removal technique. We use the Iterative Null-space Projection (INLP) algorithm, and show by measuring classification accuracy before and after debiasing, that translationese is reduced at both sentence and word level. We evaluate the utility of debiasing translationese on a natural language inference (NLI) task, and show that by reducing this bias, NLI accuracy improves. To the best of our knowledge, this is the first study to debias translationese as represented in latent embedding space.

Related papers

Lost in Literalism: How Supervised Training Shapes Translationese in LLMs [51.04435855143767]
Large language models (LLMs) have achieved remarkable success in machine translation. However, translationese, characterized by overly literal and unnatural translations, remains a persistent challenge. We introduce methods to mitigate these biases, including polishing golden references and filtering unnatural training instances.
arXiv Detail & Related papers (2025-03-06T12:14:45Z)
Multi-perspective Alignment for Increasing Naturalness in Neural Machine Translation [11.875491080062233]
Neural machine translation (NMT) systems amplify lexical biases present in their training data, leading to artificially impoverished language in output translations. We introduce a novel method that rewards both naturalness and content preservation. We evaluate our method on English-to-Dutch literary translation, and find that our best model produces translations that are lexically richer and exhibit more properties of human-written language, without loss in translation accuracy.
arXiv Detail & Related papers (2024-12-11T15:42:22Z)
Mitigating Translationese in Low-resource Languages: The Storyboard Approach [9.676710061071809]
We propose a novel approach for data collection by leveraging storyboards to elicit more fluent and natural sentences. Our method involves presenting native speakers with visual stimuli in the form of storyboards and collecting their descriptions without direct exposure to the source text. We conducted a comprehensive evaluation comparing our storyboard-based approach with traditional text translation-based methods in terms of accuracy and fluency.
arXiv Detail & Related papers (2024-07-14T10:47:03Z)
Crossing the Threshold: Idiomatic Machine Translation through Retrieval Augmentation and Loss Weighting [66.02718577386426]
We provide a simple characterization of idiomatic translation and related issues. We conduct a synthetic experiment revealing a tipping point at which transformer-based machine translation models correctly default to idiomatic translations. To improve translation of natural idioms, we introduce two straightforward yet effective techniques.
arXiv Detail & Related papers (2023-10-10T23:47:25Z)
Extract and Attend: Improving Entity Translation in Neural Machine Translation [141.7840980565706]
We propose an Extract-and-Attend approach to enhance entity translation in NMT. The proposed method is effective on improving both the translation accuracy of entities and the overall translation quality.
arXiv Detail & Related papers (2023-06-04T03:05:25Z)
The Best of Both Worlds: Combining Human and Machine Translations for Multilingual Semantic Parsing with Active Learning [50.320178219081484]
We propose an active learning approach that exploits the strengths of both human and machine translations. An ideal utterance selection can significantly reduce the error and bias in the translated data.
arXiv Detail & Related papers (2023-05-22T05:57:47Z)
DEEP: DEnoising Entity Pre-training for Neural Machine Translation [123.6686940355937]
It has been shown that machine translation models usually generate poor translations for named entities that are infrequent in the training corpus. We propose DEEP, a DEnoising Entity Pre-training method that leverages large amounts of monolingual data and a knowledge base to improve named entity translation accuracy within sentences.
arXiv Detail & Related papers (2021-11-14T17:28:09Z)
Time-Aware Ancient Chinese Text Translation and Inference [6.787414471399024]
We aim to address the challenges surrounding the translation of ancient Chinese text. The linguistic gap due to the difference in eras results in translations that are poor in quality. Most translations are missing the contextual information that is often very crucial to understanding the text.
arXiv Detail & Related papers (2021-07-07T12:23:52Z)
On the Language Coverage Bias for Neural Machine Translation [81.81456880770762]
Language coverage bias is important for neural machine translation (NMT) because the target-original training data is not well exploited in current practice. By carefully designing experiments, we provide comprehensive analyses of the language coverage bias in the training data. We propose two simple and effective approaches to alleviate the language coverage bias problem.
arXiv Detail & Related papers (2021-06-07T01:55:34Z)
Translation Artifacts in Cross-lingual Transfer Learning [51.66536640084888]
We show that machine translation can introduce subtle artifacts that have a notable impact in existing cross-lingual models. In natural language inference, translating the premise and the hypothesis independently can reduce the lexical overlap between them. We also improve the state-of-the-art in XNLI for the translate-test and zero-shot approaches by 4.3 and 2.8 points, respectively.
arXiv Detail & Related papers (2020-04-09T17:54:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.