Fugu-MT 論文翻訳(概要): Q-Save: Towards Scoring and Attribution for Generated Video Evaluation

論文の概要: Q-Save: Towards Scoring and Attribution for Generated Video Evaluation

arxiv url: http://arxiv.org/abs/2511.18825v1
Date: Mon, 24 Nov 2025 07:00:21 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-25 18:34:25.063916
Title: Q-Save: Towards Scoring and Attribution for Generated Video Evaluation
Title（参考訳）: Q-Save: 生成ビデオ評価のためのスコーリングと属性
Authors: Xiele Wu, Zicheng Zhang, Mingtao Chen, Yixian Liu, Yiming Liu, Shushi Wang, Zhichao Hu, Yuhong Liu, Guangtao Zhai, Xiaohong Liu,
Abstract要約: 本稿では,AIGV品質の総合評価のためのベンチマークデータセットとモデルであるQ-Saveを紹介する。データセットには10000近いビデオが含まれており、それぞれにスカラー平均評価スコア(MOS)と微粒な属性ラベルが付与されている。品質評価と属性に基づく説明を共同で行う統一評価モデルを提案する。
参考スコア（独自算出の注目度）: 65.83319736145869
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We present Q-Save, a new benchmark dataset and model for holistic and explainable evaluation of AI-generated video (AIGV) quality. The dataset contains near 10000 videos, each annotated with a scalar mean opinion score (MOS) and fine-grained attribution labels along three core dimensions: visual quality, dynamic quality, and text-video alignment. These multi-aspect annotations enable both accurate quality assessment and interpretable reasoning behind the scores. To leverage this data, we propose a unified evaluation model that jointly performs quality scoring and attribution-based explanation. The model adopts the SlowFast framework to distinguish between fast frames and slow frames - slow frames are processed with high resolution while fast frames use low resolution, balancing evaluation accuracy and computational efficiency. For training, we use data formatted in Chain-of-Thought (COT) style and employ a multi-stage strategy: we first conduct Supervised Fine-Tuning (SFT), then further enhance the model with Grouped Relative Policy Optimization (GRPO), and finally perform SFT again to improve model stability. Experimental results demonstrate that our model achieves state-of-the-art performance in video quality prediction while also providing human-aligned, interpretable justifications. Our dataset and model establish a strong foundation for explainable evaluation in generative video research, contributing to the development of multimodal generation and trustworthy AI. Code and dataset will be released upon publication.
Abstract（参考訳）: 本稿では,AIGV品質の総合的かつ説明可能な評価のための,新しいベンチマークデータセットとモデルであるQ-Saveを紹介する。データセットには10000近いビデオが含まれており、それぞれにスカラー平均世論スコア(MOS)と、視覚的品質、動的品質、テキスト-ビデオアライメントという3つの中核次元に沿った微粒な属性ラベルがアノテートされている。これらのマルチアスペクトアノテーションは、スコアの背後にある正確な品質評価と解釈可能な推論の両方を可能にする。このデータを活用するために,品質スコアリングと属性に基づく説明を共同で行う統一評価モデルを提案する。このモデルは、高速フレームと低速フレームを区別するためにSlowFastフレームワークを採用し、高速フレームは高解像度で処理される一方、高速フレームは低解像度を使用し、評価精度と計算効率のバランスをとる。トレーニングには、Chain-of-Thought (COT) スタイルでフォーマットされたデータを使用し、まず Supervised Fine-Tuning (SFT) を実行し、次に Grouped Relative Policy Optimization (GRPO) でモデルをさらに強化し、最後にSFTを再び実行し、モデルの安定性を向上させる。実験結果から,映像品質予測における最先端性能を実現するとともに,人間に適応し,解釈可能な正当性を提供することが示された。我々のデータセットとモデルは、生成的ビデオ研究における説明可能な評価のための強力な基盤を確立し、マルチモーダル・ジェネレーションと信頼できるAIの開発に寄与する。コードとデータセットは公開時にリリースされる。

論文の概要: Q-Save: Towards Scoring and Attribution for Generated Video Evaluation

関連論文リスト