Fugu-MT 論文翻訳(概要): From Words to Worlds: Benchmarking Cross-Cultural Cultural Understanding in Machine Translation

論文の概要: From Words to Worlds: Benchmarking Cross-Cultural Cultural Understanding in Machine Translation

arxiv url: http://arxiv.org/abs/2603.17303v1
Date: Wed, 18 Mar 2026 02:59:44 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-19 18:32:57.485732
Title: From Words to Worlds: Benchmarking Cross-Cultural Cultural Understanding in Machine Translation
Title（参考訳）: 言葉から世界へ:機械翻訳における異文化理解のベンチマーク
Authors: Bangju Han, Yingqi Wang, Huang Qing, Tiyuan Li, Fengyi Yang, Ahtamjan Ahmat, Abibulla Atawulla, Yating Yang, Xi Zhou,
Abstract要約: CulT-Evalは、モデルが異なるタイプの文化的基盤表現をどのように扱うかを評価するために設計されたベンチマークである。 CulT-Evalは、複数の文化的な接地された表現にまたがる、7,959以上の慎重にキュレートされたインスタンスから構成されている。文化的な意味の逸脱を対象とする相補的評価尺度を提案する。
参考スコア（独自算出の注目度）: 16.809989616664605
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Culture-expressions, such as idioms, slang, and culture-specific items (CSIs), are pervasive in natural language and encode meanings that go beyond literal linguistic form. Accurately translating such expressions remains challenging for machine translation systems. Despite this, existing benchmarks remain fragmented and do not provide a systematic framework for evaluating translation performance on culture-loaded expressions. To address this gap, we introduce CulT-Eval, a benchmark designed to evaluate how models handle different types of culturally grounded expressions. CulT-Eval comprises over 7,959 carefully curated instances spanning multiple types of culturally grounded expressions, with a comprehensive error taxonomy covering culturally grounded expressions. Through extensive evaluation of large language models and detailed analysis, we identify recurring and systematic failure modes that are not adequately captured by existing automatic metrics. Accordingly, we propose a complementary evaluation metric that targets culturally induced meaning deviations overlooked by standard MT metrics. The results indicate that current models struggle to preserve culturally grounded meaning and to capture the cultural and contextual nuances essential for accurate translation. Our benchmark and code are available at https://anonymous.4open.science/r/CulT-Eval-E75D/.
Abstract（参考訳）: 慣用句、スラング、文化特化項目(CSI)などの文化表現は、自然言語に広く浸透し、文字通りの言語形式を超えた意味をエンコードする。このような表現の正確な翻訳は、機械翻訳システムでは依然として困難である。それにもかかわらず、既存のベンチマークは断片化され続けており、カルチャーローディングされた表現の翻訳性能を評価するための体系的なフレームワークを提供していない。このギャップに対処するために、我々はCulT-Evalという、異なるタイプの文化的基盤表現をモデルがどのように扱うかを評価するために設計されたベンチマークを紹介した。 CulT-Evalは、複数の文化的根拠のある表現にまたがる7,959件の精査された事例と、文化的根拠のある表現をカバーする包括的な誤り分類を含んでいる。大規模言語モデルの広範囲な評価と詳細な分析により,既存の自動メトリクスによって適切に捉えられていない繰り返しおよび系統的な障害モードを同定する。そこで本稿では,標準MT測定値で見過ごされる文化的な意味の逸脱を対象とする補完的評価指標を提案する。その結果、現在のモデルは、文化的に根ざした意味を保存し、正確な翻訳に必要な文化的・文脈的なニュアンスを捉えるのに苦労していることが示唆された。ベンチマークとコードはhttps://anonymous.4open.science/r/CulT-Eval-E75D/で公開されている。

論文の概要: From Words to Worlds: Benchmarking Cross-Cultural Cultural Understanding in Machine Translation

関連論文リスト