Audience-specific Explanations for Machine Translation
- URL: http://arxiv.org/abs/2309.12998v1
- Date: Fri, 22 Sep 2023 17:00:45 GMT
- Title: Audience-specific Explanations for Machine Translation
- Authors: Renhan Lou, Jan Niehues
- Abstract summary: In machine translation, a common problem is that the translation of certain words even if translated can cause incomprehension of the target language audience due to different cultural backgrounds.
In this work we explore techniques to extract example explanations from a parallel corpus.
We propose a semi-automatic technique to extract these explanations from a large parallel corpus.
- Score: 17.166908218991225
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In machine translation, a common problem is that the translation of certain
words even if translated can cause incomprehension of the target language
audience due to different cultural backgrounds. A solution to solve this
problem is to add explanations for these words. In a first step, we therefore
need to identify these words or phrases. In this work we explore techniques to
extract example explanations from a parallel corpus. However, the sparsity of
sentences containing words that need to be explained makes building the
training dataset extremely difficult. In this work, we propose a semi-automatic
technique to extract these explanations from a large parallel corpus.
Experiments on English->German language pair show that our method is able to
extract sentence so that more than 10% of the sentences contain explanation,
while only 1.9% of the original sentences contain explanations. In addition,
experiments on English->French and English->Chinese language pairs also show
similar conclusions. This is therefore an essential first automatic step to
create a explanation dataset. Furthermore we show that the technique is robust
for all three language pairs.
Related papers
- A Multi-Task Text Classification Pipeline with Natural Language Explanations: A User-Centric Evaluation in Sentiment Analysis and Offensive Language Identification in Greek Tweets [8.846643533783205]
This work introduces an early concept for a novel pipeline that can be used in text classification tasks.
It comprises of two models: a classifier for labelling the text and an explanation generator which provides the explanation.
Experiments are centred around the tasks of sentiment analysis and offensive language identification in Greek tweets.
arXiv Detail & Related papers (2024-10-14T08:41:31Z) - When Abel Kills Cain: What Machine Translation Cannot Capture [0.0]
Article aims at identifying what, from a structural point of view, AI based automatic translators cannot fully capture.
It focuses on the machine's mistakes, in order to try to explain its causes.
The biblical story of Ca"in and Abel has been chosen because of its rich and critical interpretive tradition.
arXiv Detail & Related papers (2024-04-02T12:46:00Z) - Question Translation Training for Better Multilingual Reasoning [108.10066378240879]
Large language models show compelling performance on reasoning tasks but they tend to perform much worse in languages other than English.
A typical solution is to translate instruction data into all languages of interest, and then train on the resulting multilingual data, which is called translate-training.
In this paper we explore the benefits of question alignment, where we train the model to translate reasoning questions into English by finetuning on X-English parallel question data.
arXiv Detail & Related papers (2024-01-15T16:39:10Z) - Bridging Background Knowledge Gaps in Translation with Automatic
Explicitation [13.862753200823242]
Professional translators incorporate explicitations to explain the missing context.
This work introduces techniques for automatically generating explicitations, motivated by WikiExpl.
The resulting explicitations are useful as they help answer questions more accurately in a multilingual question answering framework.
arXiv Detail & Related papers (2023-12-03T07:24:12Z) - Crossing the Threshold: Idiomatic Machine Translation through Retrieval
Augmentation and Loss Weighting [66.02718577386426]
We provide a simple characterization of idiomatic translation and related issues.
We conduct a synthetic experiment revealing a tipping point at which transformer-based machine translation models correctly default to idiomatic translations.
To improve translation of natural idioms, we introduce two straightforward yet effective techniques.
arXiv Detail & Related papers (2023-10-10T23:47:25Z) - Evaluation of Automatically Constructed Word Meaning Explanations [0.0]
We present a new tool that derives explanations automatically based on collective information from very large corpora.
We show that the presented approach allows to create explanations that contain data useful for understanding the word meaning in approximately 90% of cases.
arXiv Detail & Related papers (2023-02-27T09:47:55Z) - Explanation Selection Using Unlabeled Data for Chain-of-Thought
Prompting [80.9896041501715]
Explanations that have not been "tuned" for a task, such as off-the-shelf explanations written by nonexperts, may lead to mediocre performance.
This paper tackles the problem of how to optimize explanation-infused prompts in a blackbox fashion.
arXiv Detail & Related papers (2023-02-09T18:02:34Z) - Human Interpretation of Saliency-based Explanation Over Text [65.29015910991261]
We study saliency-based explanations over textual data.
We find that people often mis-interpret the explanations.
We propose a method to adjust saliencies based on model estimates of over- and under-perception.
arXiv Detail & Related papers (2022-01-27T15:20:32Z) - DEEP: DEnoising Entity Pre-training for Neural Machine Translation [123.6686940355937]
It has been shown that machine translation models usually generate poor translations for named entities that are infrequent in the training corpus.
We propose DEEP, a DEnoising Entity Pre-training method that leverages large amounts of monolingual data and a knowledge base to improve named entity translation accuracy within sentences.
arXiv Detail & Related papers (2021-11-14T17:28:09Z) - "Listen, Understand and Translate": Triple Supervision Decouples
End-to-end Speech-to-text Translation [49.610188741500274]
An end-to-end speech-to-text translation (ST) takes audio in a source language and outputs the text in a target language.
Existing methods are limited by the amount of parallel corpus.
We build a system to fully utilize signals in a parallel ST corpus.
arXiv Detail & Related papers (2020-09-21T09:19:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.