Bridging Background Knowledge Gaps in Translation with Automatic
Explicitation
- URL: http://arxiv.org/abs/2312.01308v1
- Date: Sun, 3 Dec 2023 07:24:12 GMT
- Title: Bridging Background Knowledge Gaps in Translation with Automatic
Explicitation
- Authors: HyoJung Han, Jordan Lee Boyd-Graber, Marine Carpuat
- Abstract summary: Professional translators incorporate explicitations to explain the missing context.
This work introduces techniques for automatically generating explicitations, motivated by WikiExpl.
The resulting explicitations are useful as they help answer questions more accurately in a multilingual question answering framework.
- Score: 13.862753200823242
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Translations help people understand content written in another language.
However, even correct literal translations do not fulfill that goal when people
lack the necessary background to understand them. Professional translators
incorporate explicitations to explain the missing context by considering
cultural differences between source and target audiences. Despite its potential
to help users, NLP research on explicitation is limited because of the dearth
of adequate evaluation methods. This work introduces techniques for
automatically generating explicitations, motivated by WikiExpl: a dataset that
we collect from Wikipedia and annotate with human translators. The resulting
explicitations are useful as they help answer questions more accurately in a
multilingual question answering framework.
Related papers
- Evaluating LLMs for Targeted Concept Simplification for Domain-Specific Texts [53.421616210871704]
Lack of context and unfamiliarity with difficult concepts is a major reason for adult readers' difficulty with domain-specific text.
We introduce "targeted concept simplification," a simplification task for rewriting text to help readers comprehend text containing unfamiliar concepts.
We benchmark the performance of open-source and commercial LLMs and a simple dictionary baseline on this task.
arXiv Detail & Related papers (2024-10-28T05:56:51Z) - Aligning Translation-Specific Understanding to General Understanding in Large Language Models [32.0119328710383]
Large Language models (LLMs) have exhibited remarkable abilities in understanding complex texts.
This study reveals the misalignment between the translation-specific understanding and the general understanding inside LLMs.
We propose a novel translation process, DUAT (Difficult words Understanding Aligned Translation), explicitly incorporating the general understanding on the complicated content.
arXiv Detail & Related papers (2024-01-10T11:03:53Z) - Crossing the Threshold: Idiomatic Machine Translation through Retrieval
Augmentation and Loss Weighting [66.02718577386426]
We provide a simple characterization of idiomatic translation and related issues.
We conduct a synthetic experiment revealing a tipping point at which transformer-based machine translation models correctly default to idiomatic translations.
To improve translation of natural idioms, we introduce two straightforward yet effective techniques.
arXiv Detail & Related papers (2023-10-10T23:47:25Z) - Audience-specific Explanations for Machine Translation [17.166908218991225]
In machine translation, a common problem is that the translation of certain words even if translated can cause incomprehension of the target language audience due to different cultural backgrounds.
In this work we explore techniques to extract example explanations from a parallel corpus.
We propose a semi-automatic technique to extract these explanations from a large parallel corpus.
arXiv Detail & Related papers (2023-09-22T17:00:45Z) - X-PARADE: Cross-Lingual Textual Entailment and Information Divergence across Paragraphs [55.80189506270598]
X-PARADE is the first cross-lingual dataset of paragraph-level information divergences.
Annotators label a paragraph in a target language at the span level and evaluate it with respect to a corresponding paragraph in a source language.
Aligned paragraphs are sourced from Wikipedia pages in different languages.
arXiv Detail & Related papers (2023-09-16T04:34:55Z) - The Best of Both Worlds: Combining Human and Machine Translations for
Multilingual Semantic Parsing with Active Learning [50.320178219081484]
We propose an active learning approach that exploits the strengths of both human and machine translations.
An ideal utterance selection can significantly reduce the error and bias in the translated data.
arXiv Detail & Related papers (2023-05-22T05:57:47Z) - Interactive-Chain-Prompting: Ambiguity Resolution for Crosslingual
Conditional Generation with Interaction [38.73550742775257]
A source query in one language may yield several translation options in another language without any extra context.
We propose a novel method interactive-chain prompting that reduces translations into a list of subproblems addressing ambiguities.
We create a dataset exhibiting different linguistic phenomena which leads to ambiguities at inference for four languages.
arXiv Detail & Related papers (2023-01-24T21:08:13Z) - Human Interpretation of Saliency-based Explanation Over Text [65.29015910991261]
We study saliency-based explanations over textual data.
We find that people often mis-interpret the explanations.
We propose a method to adjust saliencies based on model estimates of over- and under-perception.
arXiv Detail & Related papers (2022-01-27T15:20:32Z) - Prosody-Aware Neural Machine Translation for Dubbing [9.49303003480503]
We introduce the task of prosody-aware machine translation which aims at generating translations suitable for dubbing.
Dubbing of a spoken sentence requires transferring the content as well as the prosodic structure of the source into the target language to preserve timing information.
We propose an implicit and explicit modeling approaches to integrate prosody information into neural machine translation.
arXiv Detail & Related papers (2021-12-16T01:11:08Z) - Do Context-Aware Translation Models Pay the Right Attention? [61.25804242929533]
Context-aware machine translation models are designed to leverage contextual information, but often fail to do so.
In this paper, we ask several questions: What contexts do human translators use to resolve ambiguous words?
We introduce SCAT (Supporting Context for Ambiguous Translations), a new English-French dataset comprising supporting context words for 14K translations.
Using SCAT, we perform an in-depth analysis of the context used to disambiguate, examining positional and lexical characteristics of the supporting words.
arXiv Detail & Related papers (2021-05-14T17:32:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.