Fugu-MT 論文翻訳(概要): V2V-Bench: A Comprehensive Benchmark for Video-to-Video Generation Evaluation

論文の概要: V2V-Bench: A Comprehensive Benchmark for Video-to-Video Generation Evaluation

arxiv url: http://arxiv.org/abs/2606.05665v1
Date: Thu, 04 Jun 2026 03:48:42 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-05 22:39:44.534404
Title: V2V-Bench: A Comprehensive Benchmark for Video-to-Video Generation Evaluation
Title（参考訳）: V2V-Bench: ビデオ対ビデオ生成評価のための総合ベンチマーク
Authors: Tao Liu, Leela Krishna, Gouti Pavan Kumar, Sreeja K, Vishav Garg,
Abstract要約: 出力は編集命令に従い、フレームレベルのソース映像との対応を維持する必要があるため、ビデオ間(V2V)の生成は評価が難しい。 V2V-Benchは、時間的アライメント、構造的忠実度、変換品質、ビデオ品質、セマンティックアライメントの5つのカテゴリに分類されるベンチマークである。 V2V-Benchは、編集タスクに挑戦するさまざまなソースビデオと、Grok ImagineとGemini Veo3という2つの商用モデルと、オープンソースモデルOpen Sora 2.0を組み合わせて評価する。
参考スコア（独自算出の注目度）: 2.5736307039025057
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Video-to-video (V2V) generation is difficult to evaluate because outputs must both follow editing instructions and preserve frame-level correspondence with the source video, which existing T2V and I2V metrics do not capture. We introduce V2V-Bench, a 11-dimension benchmark organized into five categories: temporal alignment, structural fidelity, transformation quality, video quality, and semantic alignment. V2V-Bench pairs diverse source videos with challenging editing tasks and evaluates two commercial models, Grok Imagine and Gemini Veo3, and one open-source model, Open Sora 2. Results show complementary model strengths: Grok performs better on editing fidelity, while Veo3 achieves stronger visual quality. On six V2V-specific dimensions, V2V-Bench reaches a Spearman correlation of 0.905 with human judgments.
Abstract（参考訳）: 既存のT2VとI2Vのメトリクスが捉えないソースビデオとのフレームレベルの対応を維持するために、出力が編集命令に従う必要があるため、ビデオ間(V2V)の生成は評価が難しい。 V2V-Benchは、時間的アライメント、構造的忠実度、変換品質、ビデオ品質、セマンティックアライメントの5つのカテゴリに分類される11次元のベンチマークである。 V2V-Benchは、編集タスクに挑戦するさまざまなソースビデオと、Grok ImagineとGemini Veo3という2つの商用モデルと、オープンソースモデルOpen Sora 2.0を組み合わせて評価する。 Grokは編集精度が向上し、Veo3は視覚的品質が向上した。 6つのV2V比次元において、V2V-ベンチは人間の判断と0.905のスピアマン相関に達する。

論文の概要: V2V-Bench: A Comprehensive Benchmark for Video-to-Video Generation Evaluation

関連論文リスト