Team ACK at SemEval-2025 Task 2: Beyond Word-for-Word Machine Translation for English-Korean Pairs
- URL: http://arxiv.org/abs/2504.20451v1
- Date: Tue, 29 Apr 2025 05:58:19 GMT
- Title: Team ACK at SemEval-2025 Task 2: Beyond Word-for-Word Machine Translation for English-Korean Pairs
- Authors: Daniel Lee, Harsh Sharma, Jieun Han, Sunny Jeong, Alice Oh, Vered Shwartz,
- Abstract summary: Translating knowledge-intensive and entity-rich text between English and Korean requires transcreation to preserve language-specific and cultural nuances.<n>We evaluate 13 models (LLMs and MT models) using automatic metrics and human assessment by bilingual annotators.
- Score: 23.19401079530962
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Translating knowledge-intensive and entity-rich text between English and Korean requires transcreation to preserve language-specific and cultural nuances beyond literal, phonetic or word-for-word conversion. We evaluate 13 models (LLMs and MT models) using automatic metrics and human assessment by bilingual annotators. Our findings show LLMs outperform traditional MT systems but struggle with entity translation requiring cultural adaptation. By constructing an error taxonomy, we identify incorrect responses and entity name errors as key issues, with performance varying by entity type and popularity level. This work exposes gaps in automatic evaluation metrics and hope to enable future work in completing culturally-nuanced machine translation.
Related papers
- Proverbs Run in Pairs: Evaluating Proverb Translation Capability of Large Language Model [5.343968769798019]
This paper investigates the capability of state-of-the-art neural machine translation (NMT) and large language models (LLMs) in translating proverbs.
We construct a translation dataset of proverbs and proverbs in conversation for four language pairs.
Experiments show that the studied models can achieve good translation between languages with similar cultural backgrounds.
arXiv Detail & Related papers (2025-01-21T07:54:22Z) - Towards Cross-Cultural Machine Translation with Retrieval-Augmented Generation from Multilingual Knowledge Graphs [18.84670051328337]
XC-Translate is the first large-scale, manually-created benchmark for machine translation.
KG-MT is a novel end-to-end method to integrate information from a multilingual knowledge graph into a neural machine translation model.
arXiv Detail & Related papers (2024-10-17T21:56:22Z) - Benchmarking Machine Translation with Cultural Awareness [50.183458829028226]
Translating culture-related content is vital for effective cross-cultural communication.
Many culture-specific items (CSIs) often lack viable translations across languages.
This difficulty hinders the analysis of cultural awareness of machine translation systems.
arXiv Detail & Related papers (2023-05-23T17:56:33Z) - Discourse Centric Evaluation of Machine Translation with a Densely
Annotated Parallel Corpus [82.07304301996562]
This paper presents a new dataset with rich discourse annotations, built upon the large-scale parallel corpus BWB introduced in Jiang et al.
We investigate the similarities and differences between the discourse structures of source and target languages.
We discover that MT outputs differ fundamentally from human translations in terms of their latent discourse structures.
arXiv Detail & Related papers (2023-05-18T17:36:41Z) - Document-Level Machine Translation with Large Language Models [91.03359121149595]
Large language models (LLMs) can produce coherent, cohesive, relevant, and fluent answers for various natural language processing (NLP) tasks.
This paper provides an in-depth evaluation of LLMs' ability on discourse modeling.
arXiv Detail & Related papers (2023-04-05T03:49:06Z) - Adaptive Machine Translation with Large Language Models [7.803471587734353]
We investigate how we can utilize in-context learning to improve real-time adaptive machine translation.
We conduct experiments across five diverse language pairs, namely English-to-Arabic (EN-AR), English-to-Chinese (EN-ZH), English-to-French (EN-FR), English-to-Kinyarwanda (EN-RW), and English-to-Spanish (EN-ES)
arXiv Detail & Related papers (2023-01-30T21:17:15Z) - DivEMT: Neural Machine Translation Post-Editing Effort Across
Typologically Diverse Languages [5.367993194110256]
DivEMT is the first publicly available post-editing study of Neural Machine Translation (NMT) over a typologically diverse set of target languages.
We assess the impact on translation productivity of two state-of-the-art NMT systems, namely: Google Translate and the open-source multilingual model mBART50.
arXiv Detail & Related papers (2022-05-24T17:22:52Z) - Lingua Custodia's participation at the WMT 2021 Machine Translation
using Terminologies shared task [3.3108924994485096]
We consider three directions, namely English to French, Russian, and Chinese.
We introduce two main changes to the standard procedure to handle terminologies.
Our method satisfies most terminology constraints while maintaining high translation quality.
arXiv Detail & Related papers (2021-11-03T10:36:32Z) - When Does Translation Require Context? A Data-driven, Multilingual
Exploration [71.43817945875433]
proper handling of discourse significantly contributes to the quality of machine translation (MT)
Recent works in context-aware MT attempt to target a small set of discourse phenomena during evaluation.
We develop the Multilingual Discourse-Aware benchmark, a series of taggers that identify and evaluate model performance on discourse phenomena.
arXiv Detail & Related papers (2021-09-15T17:29:30Z) - ChrEnTranslate: Cherokee-English Machine Translation Demo with Quality
Estimation and Corrective Feedback [70.5469946314539]
ChrEnTranslate is an online machine translation demonstration system for translation between English and an endangered language Cherokee.
It supports both statistical and neural translation models as well as provides quality estimation to inform users of reliability.
arXiv Detail & Related papers (2021-07-30T17:58:54Z) - Bootstrapping a Crosslingual Semantic Parser [74.99223099702157]
We adapt a semantic trained on a single language, such as English, to new languages and multiple domains with minimal annotation.
We query if machine translation is an adequate substitute for training data, and extend this to investigate bootstrapping using joint training with English, paraphrasing, and multilingual pre-trained models.
arXiv Detail & Related papers (2020-04-06T12:05:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.