Is it Great or Terrible? Preserving Sentiment in Neural Machine
Translation of Arabic Reviews
- URL: http://arxiv.org/abs/2010.13814v1
- Date: Mon, 26 Oct 2020 18:01:52 GMT
- Title: Is it Great or Terrible? Preserving Sentiment in Neural Machine
Translation of Arabic Reviews
- Authors: Hadeel Saadany, Constantin Orasan
- Abstract summary: This paper investigates the challenges involved in translating book reviews from Arabic into English.
We focus on the errors that lead to incorrect translation of sentiment polarity.
Our analysis shows that the output of online translation tools of Arabic can either fail to transfer the sentiment at all by producing a neutral target text.
- Score: 3.553493344868413
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Since the advent of Neural Machine Translation (NMT) approaches there has
been a tremendous improvement in the quality of automatic translation. However,
NMT output still lacks accuracy in some low-resource languages and sometimes
makes major errors that need extensive post-editing. This is particularly
noticeable with texts that do not follow common lexico-grammatical standards,
such as user generated content (UGC). In this paper we investigate the
challenges involved in translating book reviews from Arabic into English, with
particular focus on the errors that lead to incorrect translation of sentiment
polarity. Our study points to the special characteristics of Arabic UGC,
examines the sentiment transfer errors made by Google Translate of Arabic UGC
to English, analyzes why the problem occurs, and proposes an error typology
specific of the translation of Arabic UGC. Our analysis shows that the output
of online translation tools of Arabic UGC can either fail to transfer the
sentiment at all by producing a neutral target text, or completely flips the
sentiment polarity of the target word or phrase and hence delivers a wrong
affect message. We address this problem by fine-tuning an NMT model with
respect to sentiment polarity showing that this approach can significantly help
with correcting sentiment errors detected in the online translation of Arabic
UGC.
Related papers
- xTower: A Multilingual LLM for Explaining and Correcting Translation Errors [22.376508000237042]
xTower is an open large language model (LLM) built on top of TowerBase to provide free-text explanations for translation errors.
We test xTower across various experimental setups in generating translation corrections, demonstrating significant improvements in translation quality.
arXiv Detail & Related papers (2024-06-27T18:51:46Z) - Understanding and Addressing the Under-Translation Problem from the Perspective of Decoding Objective [72.83966378613238]
Under-translation and over-translation remain two challenging problems in state-of-the-art Neural Machine Translation (NMT) systems.
We conduct an in-depth analysis on the underlying cause of under-translation in NMT, providing an explanation from the perspective of decoding objective.
We propose employing the confidence of predicting End Of Sentence (EOS) as a detector for under-translation, and strengthening the confidence-based penalty to penalize candidates with a high risk of under-translation.
arXiv Detail & Related papers (2024-05-29T09:25:49Z) - Mitigating Hallucinations and Off-target Machine Translation with
Source-Contrastive and Language-Contrastive Decoding [53.84948040596055]
We introduce two related methods to mitigate failure cases with a modified decoding objective.
Experiments on the massively multilingual models M2M-100 (418M) and SMaLL-100 show that these methods suppress hallucinations and off-target translations.
arXiv Detail & Related papers (2023-09-13T17:15:27Z) - Evaluation of Chinese-English Machine Translation of Emotion-Loaded
Microblog Texts: A Human Annotated Dataset for the Quality Assessment of
Emotion Translation [7.858458986992082]
In this paper, we focus on how current Machine Translation (MT) tools perform on the translation of emotion-loaded texts.
We propose this evaluation framework based on the Multidimensional Quality Metrics (MQM) and perform a detailed error analysis of the MT outputs.
arXiv Detail & Related papers (2023-06-20T21:22:45Z) - On the Copying Problem of Unsupervised NMT: A Training Schedule with a
Language Discriminator Loss [120.19360680963152]
unsupervised neural machine translation (UNMT) has achieved success in many language pairs.
The copying problem, i.e., directly copying some parts of the input sentence as the translation, is common among distant language pairs.
We propose a simple but effective training schedule that incorporates a language discriminator loss.
arXiv Detail & Related papers (2023-05-26T18:14:23Z) - A Semi-supervised Approach for a Better Translation of Sentiment in
Dialectical Arabic UGT [2.6763498831034034]
We introduce a semi-supervised approach that exploits both monolingual and parallel data for training an NMT system.
We will show that our proposed system can significantly help with correcting sentiment errors detected in the online translation of dialectical Arabic UGT.
arXiv Detail & Related papers (2022-10-21T11:55:55Z) - Rethink about the Word-level Quality Estimation for Machine Translation
from Human Judgement [57.72846454929923]
We create a benchmark dataset, emphHJQE, where the expert translators directly annotate poorly translated words.
We propose two tag correcting strategies, namely tag refinement strategy and tree-based annotation strategy, to make the TER-based artificial QE corpus closer to emphHJQE.
The results show our proposed dataset is more consistent with human judgement and also confirm the effectiveness of the proposed tag correcting strategies.
arXiv Detail & Related papers (2022-09-13T02:37:12Z) - ChrEnTranslate: Cherokee-English Machine Translation Demo with Quality
Estimation and Corrective Feedback [70.5469946314539]
ChrEnTranslate is an online machine translation demonstration system for translation between English and an endangered language Cherokee.
It supports both statistical and neural translation models as well as provides quality estimation to inform users of reliability.
arXiv Detail & Related papers (2021-07-30T17:58:54Z) - It's not a Non-Issue: Negation as a Source of Error in Machine
Translation [33.991817055535854]
We investigate whether translating negation is an issue for modern machine translation systems using 17 translation directions as test bed.
We find that indeed the presence of negation can significantly impact downstream quality, in some cases resulting in quality reductions of more than 60%.
arXiv Detail & Related papers (2020-10-12T03:34:44Z) - It's Easier to Translate out of English than into it: Measuring Neural
Translation Difficulty by Cross-Mutual Information [90.35685796083563]
Cross-mutual information (XMI) is an asymmetric information-theoretic metric of machine translation difficulty.
XMI exploits the probabilistic nature of most neural machine translation models.
We present the first systematic and controlled study of cross-lingual translation difficulties using modern neural translation systems.
arXiv Detail & Related papers (2020-05-05T17:38:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.