Fugu-MT 論文翻訳(概要): CULTURESCORE: Evaluating Cultural Faithfulness in Video Generation Models

論文の概要: CULTURESCORE: Evaluating Cultural Faithfulness in Video Generation Models

arxiv url: http://arxiv.org/abs/2606.07311v1
Date: Fri, 05 Jun 2026 14:28:29 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-08 14:33:29.77909
Title: CULTURESCORE: Evaluating Cultural Faithfulness in Video Generation Models
Title（参考訳）: CULTURESCORE:ビデオ生成モデルにおける文化的信条の評価
Authors: Anku Rani, Wei Dai, Shravan Nayak, Pattie Maes, Mahdi M. Kalayeh, Paul Pu Liang,
Abstract要約: VideoScoreのような現在のメトリクスは、視覚的品質のみを測定するが、文化的忠実性を評価するメカニズムを提供しない。本稿では,文化的忠実度を3次元に分解する構成的評価フレームワークであるCultureScoreを提案する。我々はこのフレームワークを10か国にまたがる評価スイートを通じて運用し、3つの最先端モデルで6180個のビデオを生成する。
参考スコア（独自算出の注目度）: 46.154946394307814
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As video generation models like Veo 3.1 and LTX-2 advance, their ability to accurately represent diverse global cultures remains a critical yet understudied frontier. Current metrics, such as VideoScore, only measure visual quality but offer no mechanism for assessing cultural faithfulness. Consequently, a model that replaces a Namaste with a handshake receives the same score as one that generates the gesture correctly. We propose CultureScore, a compositional evaluation framework that decomposes cultural faithfulness into three granular dimensions: Identity (who is represented), Context (culturally localized background), and Behavior (normative gestures and interactions). We operationalize this framework through an evaluation suite spanning 10 countries, yielding 6,180 generated videos across three state-of-the-art models. Our evaluation reveals that no current model achieves culturally faithful video generation: the best-performing model reaches only 56.8\% overall CultureScore, with Behavior the most challenging dimension, which remains below 52\% across all models. Furthermore, human preference rankings align directionally with CultureScore but are inverted relative to VideoScore; the highest-scoring model on visual quality was ranked last by annotators, underscoring that cultural faithfulness is an essential criterion for equitable video generation.
Abstract（参考訳）: Veo 3.1 や LTX-2 のようなビデオ生成モデルが進歩するにつれ、多様なグローバル文化を正確に表現する能力は依然として重要で未調査のフロンティアである。 VideoScoreのような現在のメトリクスは、視覚的品質のみを測定するが、文化的忠実性を評価するメカニズムを提供しない。これにより、Namasteをハンドシェイクに置き換えるモデルは、ジェスチャーを正しく生成するモデルと同じスコアを受け取る。本稿では,文化的忠実度を,アイデンティティ(表現),コンテキスト(文化的に局所的な背景),行動(規範的ジェスチャーと相互作用)の3次元に分解する構成的評価フレームワークであるCultureScoreを提案する。我々はこのフレームワークを10か国にまたがる評価スイートを通じて運用し、3つの最先端モデルで6180個のビデオを生成する。我々の評価は、現在のモデルが文化的に忠実なビデオ生成を達成することはないことを示している: 最高のパフォーマンスモデルは、全体のCultureScoreの56.8\%にしか達せず、行動は最も困難な次元であり、すべてのモデルで52\%以下である。さらに、人間の嗜好ランクはCultureScoreと直交するが、VideoScoreに比較して逆転し、視覚的品質に関する最高評価モデルがアノテータによって最下位にランク付けされ、文化的忠実性は平等なビデオ生成に不可欠な基準である、と強調した。

論文の概要: CULTURESCORE: Evaluating Cultural Faithfulness in Video Generation Models

関連論文リスト