Related papers: Benchmarking Machine Translation with Cultural Awareness

Benchmarking Machine Translation with Cultural Awareness

URL: http://arxiv.org/abs/2305.14328v3
Date: Sat, 19 Oct 2024 05:01:46 GMT
Title: Benchmarking Machine Translation with Cultural Awareness
Authors: Binwei Yao, Ming Jiang, Tara Bobinac, Diyi Yang, Junjie Hu,
Abstract summary: Translating culture-related content is vital for effective cross-cultural communication. Many culture-specific items (CSIs) often lack viable translations across languages. This difficulty hinders the analysis of cultural awareness of machine translation systems.
Score: 50.183458829028226
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Translating culture-related content is vital for effective cross-cultural communication. However, many culture-specific items (CSIs) often lack viable translations across languages, making it challenging to collect high-quality, diverse parallel corpora with CSI annotations. This difficulty hinders the analysis of cultural awareness of machine translation (MT) systems, including traditional neural MT and the emerging MT paradigm using large language models (LLM). To address this gap, we introduce a novel parallel corpus, enriched with CSI annotations in 6 language pairs for investigating Culturally-Aware Machine Translation--CAMT. Furthermore, we design two evaluation metrics to assess CSI translations, focusing on their pragmatic translation quality. Our findings show the superior ability of LLMs over neural MTs in leveraging external cultural knowledge for translating CSIs, especially those lacking translations in the target culture.

Related papers

CaMMT: Benchmarking Culturally Aware Multimodal Machine Translation [25.213316704661352]
We introduce CaMMT, a benchmark of over 5,800 triples of images along with parallel captions in English and regional languages.<n>We find that visual context generally improves translation quality, especially in handling Culturally-Specific Items (CSIs) and correct gender usage.
arXiv Detail & Related papers (2025-05-30T10:42:44Z)
Team ACK at SemEval-2025 Task 2: Beyond Word-for-Word Machine Translation for English-Korean Pairs [23.19401079530962]
Translating knowledge-intensive and entity-rich text between English and Korean requires transcreation to preserve language-specific and cultural nuances. We evaluate 13 models (LLMs and MT models) using automatic metrics and human assessment by bilingual annotators.
arXiv Detail & Related papers (2025-04-29T05:58:19Z)
Preserving Cultural Identity with Context-Aware Translation Through Multi-Agent AI Systems [0.4218593777811082]
Language is a cornerstone of cultural identity, yet globalization and the dominance of major languages have placed nearly 3,000 languages at risk of extinction. Existing AI-driven translation models prioritize efficiency but often fail to capture cultural nuances, idiomatic expressions, and historical significance. We propose a multi-agent AI framework designed for culturally adaptive translation in underserved language communities.
arXiv Detail & Related papers (2025-03-05T06:43:59Z)
XTransplant: A Probe into the Upper Bound Performance of Multilingual Capability and Culture Adaptability in LLMs via Mutual Cross-lingual Feed-forward Transplantation [49.69780199602105]
Current large language models (LLMs) often exhibit imbalances in multilingual capabilities and cultural adaptability. We propose a probing method named XTransplant that explores cross-lingual latent interactions via cross-lingual feed-forward transplantation. We empirically prove that both the multilingual capabilities and cultural adaptability of LLMs hold the potential to be significantly improved by XTransplant.
arXiv Detail & Related papers (2024-12-17T09:05:30Z)
Towards Cross-Cultural Machine Translation with Retrieval-Augmented Generation from Multilingual Knowledge Graphs [18.84670051328337]
XC-Translate is the first large-scale, manually-created benchmark for machine translation. KG-MT is a novel end-to-end method to integrate information from a multilingual knowledge graph into a neural machine translation model.
arXiv Detail & Related papers (2024-10-17T21:56:22Z)
Cultural Adaptation of Menus: A Fine-Grained Approach [58.08115795037042]
Machine Translation of Culture-Specific Items (CSIs) poses significant challenges. Recent work on CSI translation has shown some success using Large Language Models (LLMs) to adapt to different languages and cultures. We introduce the ChineseMenuCSI dataset, the largest for Chinese-English menu corpora, annotated with CSI vs Non-CSI labels. We develop a novel methodology for automatic CSI identification, which outperforms GPT-based prompts in most categories.
arXiv Detail & Related papers (2024-08-24T09:25:18Z)
Translating Across Cultures: LLMs for Intralingual Cultural Adaptation [12.5954253354303]
We define the task of cultural adaptation and create an evaluation framework to evaluate the performance of modern LLMs. We analyze possible issues with automatic adaptation. We hope that this paper will offer more insight into the cultural understanding of LLMs and their creativity in cross-cultural scenarios.
arXiv Detail & Related papers (2024-06-20T17:06:58Z)
Distinguishing Translations by Human, NMT, and ChatGPT: A Linguistic and Statistical Approach [1.6982207802596105]
This study investigates three key questions: (1) the distinguishability of ChatGPT-generated translations from NMT and human translation (HT), (2) the linguistic characteristics of each translation type, and (3) the degree of resemblance between ChatGPT-produced translations and HT or NMT.
arXiv Detail & Related papers (2023-12-17T15:56:05Z)
Discourse Centric Evaluation of Machine Translation with a Densely Annotated Parallel Corpus [82.07304301996562]
This paper presents a new dataset with rich discourse annotations, built upon the large-scale parallel corpus BWB introduced in Jiang et al. We investigate the similarities and differences between the discourse structures of source and target languages. We discover that MT outputs differ fundamentally from human translations in terms of their latent discourse structures.
arXiv Detail & Related papers (2023-05-18T17:36:41Z)
Learning to Generalize to More: Continuous Semantic Augmentation for Neural Machine Translation [50.54059385277964]
We present a novel data augmentation paradigm termed Continuous Semantic Augmentation (CsaNMT) CsaNMT augments each training instance with an adjacency region that could cover adequate variants of literal expression under the same meaning.
arXiv Detail & Related papers (2022-04-14T08:16:28Z)
When Does Translation Require Context? A Data-driven, Multilingual Exploration [71.43817945875433]
proper handling of discourse significantly contributes to the quality of machine translation (MT) Recent works in context-aware MT attempt to target a small set of discourse phenomena during evaluation. We develop the Multilingual Discourse-Aware benchmark, a series of taggers that identify and evaluate model performance on discourse phenomena.
arXiv Detail & Related papers (2021-09-15T17:29:30Z)
It's Easier to Translate out of English than into it: Measuring Neural Translation Difficulty by Cross-Mutual Information [90.35685796083563]
Cross-mutual information (XMI) is an asymmetric information-theoretic metric of machine translation difficulty. XMI exploits the probabilistic nature of most neural machine translation models. We present the first systematic and controlled study of cross-lingual translation difficulties using modern neural translation systems.
arXiv Detail & Related papers (2020-05-05T17:38:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.