Benchmarking Machine Translation with Cultural Awareness
- URL: http://arxiv.org/abs/2305.14328v3
- Date: Sat, 19 Oct 2024 05:01:46 GMT
- Title: Benchmarking Machine Translation with Cultural Awareness
- Authors: Binwei Yao, Ming Jiang, Tara Bobinac, Diyi Yang, Junjie Hu,
- Abstract summary: Translating culture-related content is vital for effective cross-cultural communication.
Many culture-specific items (CSIs) often lack viable translations across languages.
This difficulty hinders the analysis of cultural awareness of machine translation systems.
- Score: 50.183458829028226
- License:
- Abstract: Translating culture-related content is vital for effective cross-cultural communication. However, many culture-specific items (CSIs) often lack viable translations across languages, making it challenging to collect high-quality, diverse parallel corpora with CSI annotations. This difficulty hinders the analysis of cultural awareness of machine translation (MT) systems, including traditional neural MT and the emerging MT paradigm using large language models (LLM). To address this gap, we introduce a novel parallel corpus, enriched with CSI annotations in 6 language pairs for investigating Culturally-Aware Machine Translation--CAMT. Furthermore, we design two evaluation metrics to assess CSI translations, focusing on their pragmatic translation quality. Our findings show the superior ability of LLMs over neural MTs in leveraging external cultural knowledge for translating CSIs, especially those lacking translations in the target culture.
Related papers
- Towards Cross-Cultural Machine Translation with Retrieval-Augmented Generation from Multilingual Knowledge Graphs [18.84670051328337]
XC-Translate is the first large-scale, manually-created benchmark for machine translation.
KG-MT is a novel end-to-end method to integrate information from a multilingual knowledge graph into a neural machine translation model.
arXiv Detail & Related papers (2024-10-17T21:56:22Z) - Cultural Adaptation of Menus: A Fine-Grained Approach [58.08115795037042]
Machine Translation of Culture-Specific Items (CSIs) poses significant challenges.
Recent work on CSI translation has shown some success using Large Language Models (LLMs) to adapt to different languages and cultures.
We introduce the ChineseMenuCSI dataset, the largest for Chinese-English menu corpora, annotated with CSI vs Non-CSI labels.
We develop a novel methodology for automatic CSI identification, which outperforms GPT-based prompts in most categories.
arXiv Detail & Related papers (2024-08-24T09:25:18Z) - Translating Across Cultures: LLMs for Intralingual Cultural Adaptation [12.5954253354303]
We define the task of cultural adaptation and create an evaluation framework to evaluate the performance of modern LLMs.
We analyze possible issues with automatic adaptation.
We hope that this paper will offer more insight into the cultural understanding of LLMs and their creativity in cross-cultural scenarios.
arXiv Detail & Related papers (2024-06-20T17:06:58Z) - Distinguishing Translations by Human, NMT, and ChatGPT: A Linguistic and Statistical Approach [1.6982207802596105]
This study investigates three key questions: (1) the distinguishability of ChatGPT-generated translations from NMT and human translation (HT), (2) the linguistic characteristics of each translation type, and (3) the degree of resemblance between ChatGPT-produced translations and HT or NMT.
arXiv Detail & Related papers (2023-12-17T15:56:05Z) - Discourse Centric Evaluation of Machine Translation with a Densely
Annotated Parallel Corpus [82.07304301996562]
This paper presents a new dataset with rich discourse annotations, built upon the large-scale parallel corpus BWB introduced in Jiang et al.
We investigate the similarities and differences between the discourse structures of source and target languages.
We discover that MT outputs differ fundamentally from human translations in terms of their latent discourse structures.
arXiv Detail & Related papers (2023-05-18T17:36:41Z) - Learning to Generalize to More: Continuous Semantic Augmentation for
Neural Machine Translation [50.54059385277964]
We present a novel data augmentation paradigm termed Continuous Semantic Augmentation (CsaNMT)
CsaNMT augments each training instance with an adjacency region that could cover adequate variants of literal expression under the same meaning.
arXiv Detail & Related papers (2022-04-14T08:16:28Z) - When Does Translation Require Context? A Data-driven, Multilingual
Exploration [71.43817945875433]
proper handling of discourse significantly contributes to the quality of machine translation (MT)
Recent works in context-aware MT attempt to target a small set of discourse phenomena during evaluation.
We develop the Multilingual Discourse-Aware benchmark, a series of taggers that identify and evaluate model performance on discourse phenomena.
arXiv Detail & Related papers (2021-09-15T17:29:30Z) - It's Easier to Translate out of English than into it: Measuring Neural
Translation Difficulty by Cross-Mutual Information [90.35685796083563]
Cross-mutual information (XMI) is an asymmetric information-theoretic metric of machine translation difficulty.
XMI exploits the probabilistic nature of most neural machine translation models.
We present the first systematic and controlled study of cross-lingual translation difficulties using modern neural translation systems.
arXiv Detail & Related papers (2020-05-05T17:38:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.