A Semi-supervised Approach for a Better Translation of Sentiment in
Dialectical Arabic UGT
- URL: http://arxiv.org/abs/2210.11899v2
- Date: Thu, 8 Jun 2023 12:06:36 GMT
- Title: A Semi-supervised Approach for a Better Translation of Sentiment in
Dialectical Arabic UGT
- Authors: Hadeel Saadany, Constantin Orasan, Emad Mohamed, Ashraf Tantawy
- Abstract summary: We introduce a semi-supervised approach that exploits both monolingual and parallel data for training an NMT system.
We will show that our proposed system can significantly help with correcting sentiment errors detected in the online translation of dialectical Arabic UGT.
- Score: 2.6763498831034034
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the online world, Machine Translation (MT) systems are extensively used to
translate User-Generated Text (UGT) such as reviews, tweets, and social media
posts, where the main message is often the author's positive or negative
attitude towards the topic of the text. However, MT systems still lack accuracy
in some low-resource languages and sometimes make critical translation errors
that completely flip the sentiment polarity of the target word or phrase and
hence delivers a wrong affect message. This is particularly noticeable in texts
that do not follow common lexico-grammatical standards such as the dialectical
Arabic (DA) used on online platforms. In this research, we aim to improve the
translation of sentiment in UGT written in the dialectical versions of the
Arabic language to English. Given the scarcity of gold-standard parallel data
for DA-EN in the UGT domain, we introduce a semi-supervised approach that
exploits both monolingual and parallel data for training an NMT system
initialised by a cross-lingual language model trained with supervised and
unsupervised modeling objectives. We assess the accuracy of sentiment
translation by our proposed system through a numerical 'sentiment-closeness'
measure as well as human evaluation. We will show that our semi-supervised MT
system can significantly help with correcting sentiment errors detected in the
online translation of dialectical Arabic UGT.
Related papers
- Understanding and Addressing the Under-Translation Problem from the Perspective of Decoding Objective [72.83966378613238]
Under-translation and over-translation remain two challenging problems in state-of-the-art Neural Machine Translation (NMT) systems.
We conduct an in-depth analysis on the underlying cause of under-translation in NMT, providing an explanation from the perspective of decoding objective.
We propose employing the confidence of predicting End Of Sentence (EOS) as a detector for under-translation, and strengthening the confidence-based penalty to penalize candidates with a high risk of under-translation.
arXiv Detail & Related papers (2024-05-29T09:25:49Z) - It is Not as Good as You Think! Evaluating Simultaneous Machine
Translation on Interpretation Data [58.105938143865906]
We argue that SiMT systems should be trained and tested on real interpretation data.
Our results highlight the difference of up-to 13.83 BLEU score when SiMT models are evaluated on translation vs interpretation data.
arXiv Detail & Related papers (2021-10-11T12:27:07Z) - Sentiment-Aware Measure (SAM) for Evaluating Sentiment Transfer by
Machine Translation Systems [0.0]
In translating text where sentiment is the main message, human translators give particular attention to sentiment-carrying words.
We propose a numerical sentiment-closeness' measure appropriate for assessing the accuracy of a translated affect message in text by an MT system.
arXiv Detail & Related papers (2021-09-30T07:35:56Z) - BLEU, METEOR, BERTScore: Evaluation of Metrics Performance in Assessing
Critical Translation Errors in Sentiment-oriented Text [1.4213973379473654]
Machine Translation (MT) of the online content is commonly used to process posts written in several languages.
In this paper, we assess the ability of automatic quality metrics to detect critical machine translation errors.
We conclude that there is a need for fine-tuning of automatic metrics to make them more robust in detecting sentiment critical errors.
arXiv Detail & Related papers (2021-09-29T07:51:17Z) - When Does Translation Require Context? A Data-driven, Multilingual
Exploration [71.43817945875433]
proper handling of discourse significantly contributes to the quality of machine translation (MT)
Recent works in context-aware MT attempt to target a small set of discourse phenomena during evaluation.
We develop the Multilingual Discourse-Aware benchmark, a series of taggers that identify and evaluate model performance on discourse phenomena.
arXiv Detail & Related papers (2021-09-15T17:29:30Z) - Improving Multilingual Translation by Representation and Gradient
Regularization [82.42760103045083]
We propose a joint approach to regularize NMT models at both representation-level and gradient-level.
Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance.
arXiv Detail & Related papers (2021-09-10T10:52:21Z) - ChrEnTranslate: Cherokee-English Machine Translation Demo with Quality
Estimation and Corrective Feedback [70.5469946314539]
ChrEnTranslate is an online machine translation demonstration system for translation between English and an endangered language Cherokee.
It supports both statistical and neural translation models as well as provides quality estimation to inform users of reliability.
arXiv Detail & Related papers (2021-07-30T17:58:54Z) - Sentiment-based Candidate Selection for NMT [2.580271290008534]
We propose a decoder-side approach that incorporates automatic sentiment scoring into the machine translation (MT) candidate selection process.
We train separate English and Spanish sentiment classifiers, then, using n-best candidates generated by a baseline MT model with beam search, select the candidate that minimizes the absolute difference between the sentiment score of the source sentence and that of the translation.
The results of human evaluations show that, in comparison to the open-source MT model on top of which our pipeline is built, our baseline translations are more accurate of colloquial, sentiment-heavy source texts.
arXiv Detail & Related papers (2021-04-10T19:01:52Z) - Is it Great or Terrible? Preserving Sentiment in Neural Machine
Translation of Arabic Reviews [3.553493344868413]
This paper investigates the challenges involved in translating book reviews from Arabic into English.
We focus on the errors that lead to incorrect translation of sentiment polarity.
Our analysis shows that the output of online translation tools of Arabic can either fail to transfer the sentiment at all by producing a neutral target text.
arXiv Detail & Related papers (2020-10-26T18:01:52Z) - It's Easier to Translate out of English than into it: Measuring Neural
Translation Difficulty by Cross-Mutual Information [90.35685796083563]
Cross-mutual information (XMI) is an asymmetric information-theoretic metric of machine translation difficulty.
XMI exploits the probabilistic nature of most neural machine translation models.
We present the first systematic and controlled study of cross-lingual translation difficulties using modern neural translation systems.
arXiv Detail & Related papers (2020-05-05T17:38:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.