Fugu-MT 論文翻訳(概要): MVQA-68K: A Multi-dimensional and Causally-annotated Dataset with Quality Interpretability for Video Assessment

論文の概要: MVQA-68K: A Multi-dimensional and Causally-annotated Dataset with Quality Interpretability for Video Assessment

arxiv url: http://arxiv.org/abs/2509.11589v1
Date: Mon, 15 Sep 2025 05:16:54 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-16 17:26:23.155591
Title: MVQA-68K: A Multi-dimensional and Causally-annotated Dataset with Quality Interpretability for Video Assessment
Title（参考訳）: MVQA-68K:ビデオアセスメントのための品質解釈性を備えた多次元・因果アノテーション付きデータセット
Authors: Yanyun Pu, Kehan Li, Zeyi Huang, Zhijie Zhong, Kaixiang Yang,
Abstract要約: ビデオ品質アセスメント(VQA)は、事前トレーニングで使用する大規模データセットから高品質なビデオを選択する上で、ますます重要になっている。 MVQA-68Kは68,000以上の注意深い注釈付きビデオからなる新しい多次元VQAデータセットである。実験により、MVQA-68KはVQAタスクにおける様々な大規模言語モデル(MLLM)の性能を大幅に向上させることが示された。
参考スコア（独自算出の注目度）: 14.705190484805962
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the rapid advancement of video generation models such as Sora, video quality assessment (VQA) is becoming increasingly crucial for selecting high-quality videos from large-scale datasets used in pre-training. Traditional VQA methods, typically producing single numerical scores, often lack comprehensiveness and interpretability. To address these challenges, we introduce MVQA-68K, a novel multi-dimensional VQA dataset comprising over 68,000 carefully annotated videos, covering seven essential quality dimensions: overall aesthetics, camera movement, dynamic degree, texture detail, composition, visual quality, and factual consistency. Each annotation includes detailed chain-of-thought reasoning to facilitate interpretability and comprehensive understanding. Extensive experiments demonstrate that MVQA-68K significantly enhances the performance of various multimodal large language models (MLLMs) on the VQA task, achieving state-of-the-art results not only on our internal test set (Fig.1) but also on public benchmarks including LSVQ-test, LSVQ-1080p, and LIVE-VQC. Meantime, incorporating explicit reasoning process during VQA training substantially boosts the zero-shot generalization. Code and dataset will be available at github: https://github.com/Controller01-ai/MVQA-68K
Abstract（参考訳）: Soraのようなビデオ生成モデルの急速な進歩に伴い、ビデオ品質評価(VQA)は、事前トレーニングに使用する大規模データセットから高品質なビデオを選択する上でますます重要になっている。伝統的なVQA法は、通常は単一の数値スコアを生成するが、包括性と解釈性に欠けることが多い。これらの課題に対処するために、MVQA-68Kは、68,000以上の注意深い注釈付きビデオからなる新しい多次元VQAデータセットを導入し、全体的な美学、カメラの動き、ダイナミックディテール、テクスチャディテール、構成、視覚的品質、事実整合性の7つの重要な品質次元をカバーした。それぞれのアノテーションには、解釈可能性と包括的な理解を容易にするための詳細なチェーン・オブ・シント推論が含まれている。 MVQA-68Kは、VQAタスク上での様々なマルチモーダル大言語モデル(MLLM)の性能を著しく向上させ、内部テストセット(図1)だけでなく、LSVQ-test、LSVQ-1080p、LIVE-VQCといった公開ベンチマークでも最先端の結果が得られることを示した。 VQAトレーニング中に明確な推論プロセスを導入するということは、ゼロショットの一般化を大幅に加速させる。コードとデータセットはgithubで利用可能になる。 https://github.com/Controller01-ai/MVQA-68K

論文の概要: MVQA-68K: A Multi-dimensional and Causally-annotated Dataset with Quality Interpretability for Video Assessment

関連論文リスト