Fugu-MT 論文翻訳(概要): Long-CODE: Isolating Pure Long-Context as an Orthogonal Dimension in Video Evaluation

論文の概要: Long-CODE: Isolating Pure Long-Context as an Orthogonal Dimension in Video Evaluation

arxiv url: http://arxiv.org/abs/2604.17428v1
Date: Sun, 19 Apr 2026 13:17:34 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-21 21:52:52.521332
Title: Long-CODE: Isolating Pure Long-Context as an Orthogonal Dimension in Video Evaluation
Title（参考訳）: 長調:ビデオ評価における直交次元としての純長調の分離
Authors: Zhijiang Tang, Jiaxin Qi, Bing Zhao, Jianqiang Huang,
Abstract要約: 長いビデオのメトリクスは、短いビデオアセスメントから切り離すべきである、と我々は主張する。本稿では,一連の長ビデオ属性汚損検査と,ショットダイナミックスに基づく新しい長ビデオメトリクスを提案する。提案手法は,人間の判断と最先端の相関性を実現する。
参考スコア（独自算出の注目度）: 16.64717198652712
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As video generation models achieve unprecedented capabilities, the demand for robust video evaluation metrics becomes increasingly critical. Traditional metrics are intrinsically tailored for short-video evaluation, predominantly assessing frame-level visual quality and localized temporal smoothness. However, as state-of-the-art video generation models scale to generate longer videos, these metrics fail to capture essential long-range characteristics, such as narrative richness and global causal consistency. Recognizing that short-term visual perception and long-context attributes are fundamentally orthogonal dimensions, we argue that long-video metrics should be disentangled from short-video assessments. In this paper, we focus on the rigorous justification and design of a dedicated framework for long-video evaluation. We first introduce a suite of long-video attribute corruption tests, exposing the critical limitations of existing hort-video metrics from their insensitivity to structural inconsistencies, such as shot-level perturbations and narrative shuffling. To bridge this gap, we design a novel long-video metric based on shot dynamics, which is highly sensitive to the long-range testing framework. Furthermore, we introduce Long-CODE (Long-Context as an Orthogonal Dimension for video Evaluation), a specialized dataset designed to benchmark long-video evaluation, with human annotations isolated specifically to genuine long-range characteristics. Extensive experiments show that our proposed metrics achieve state-of-the-art correlation with human judgments. Ultimately, our metric and benchmark seamlessly complement existing short-video standards, establishing a holistic and unbiased evaluation paradigm for video generation models.
Abstract（参考訳）: ビデオ生成モデルが前例のない能力を達成するにつれ、ロバストなビデオ評価指標の需要はますます重要になっている。伝統的なメトリクスは、主にフレームレベルの視覚的品質と局所的な時間的滑らかさを評価し、本質的にショートビデオの評価に適合している。しかし、最先端のビデオ生成モデルがより長いビデオを生成するためにスケールするため、これらの指標は物語の豊かさや世界的因果一貫性といった重要な長距離特性を捉えることができない。短期的な視覚知覚と長文属性が基本的に直交次元であることを認識し、長ビデオメトリクスは短ビデオアセスメントから切り離すべきであると論じる。本稿では,長期ビデオ評価のためのフレームワークの厳密な正当化と設計に焦点をあてる。まず, ショットレベルの摂動や物語のシャッフルといった構造的不整合から, 既存のホートビデオメトリクスの限界を明らかにする。このギャップを埋めるために、ショットダイナミックスに基づく新しい長ビデオメトリックを設計し、長距離テストフレームワークに非常に敏感である。さらに,ビデオ評価のための直交次元としてLong-CODE(Long-Context as an Orthogonal Dimension for Video Evaluation)を導入する。大規模な実験により,提案手法は人間の判断と最先端の相関が得られた。最終的に、我々のメトリックとベンチマークは既存のショートビデオ標準をシームレスに補完し、ビデオ生成モデルに対する全体的かつ偏見のない評価パラダイムを確立します。

論文の概要: Long-CODE: Isolating Pure Long-Context as an Orthogonal Dimension in Video Evaluation

関連論文リスト