Fugu-MT 論文翻訳(概要): Q-Router: Agentic Video Quality Assessment with Expert Model Routing and Artifact Localization

論文の概要: Q-Router: Agentic Video Quality Assessment with Expert Model Routing and Artifact Localization

arxiv url: http://arxiv.org/abs/2510.08789v2
Date: Mon, 13 Oct 2025 16:16:11 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-14 15:48:09.836162
Title: Q-Router: Agentic Video Quality Assessment with Expert Model Routing and Artifact Localization
Title（参考訳）: Q-Router:エキスパートモデルルーティングとアーティファクトローカライゼーションによるエージェントビデオ品質評価
Authors: Shuo Xing, Soumik Dey, Mingyang Wu, Ashirbad Mishra, Naveen Ravipati, Binbin Li, Hansi Wu, Zhengzhong Tu,
Abstract要約: ビデオアセスメント(VQA)は、人間の判断に沿った映像の品質を予測することを目的としている。マルチ層ルーティングモデルシステムを用いた汎用VQAのためのエージェントフレームワークQ-C-Benchを提案する。 Q-C-ベンチは、様々なベンチマークで最先端のVQAモデルと一致または超え、一般化と解釈性を大幅に改善する。
参考スコア（独自算出の注目度）: 14.141157176094737
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Video quality assessment (VQA) is a fundamental computer vision task that aims to predict the perceptual quality of a given video in alignment with human judgments. Existing performant VQA models trained with direct score supervision suffer from (1) poor generalization across diverse content and tasks, ranging from user-generated content (UGC), short-form videos, to AI-generated content (AIGC), (2) limited interpretability, and (3) lack of extensibility to novel use cases or content types. We propose Q-Router, an agentic framework for universal VQA with a multi-tier model routing system. Q-Router integrates a diverse set of expert models and employs vision--language models (VLMs) as real-time routers that dynamically reason and then ensemble the most appropriate experts conditioned on the input video semantics. We build a multi-tiered routing system based on the computing budget, with the heaviest tier involving a specific spatiotemporal artifacts localization for interpretability. This agentic design enables Q-Router to combine the complementary strengths of specialized experts, achieving both flexibility and robustness in delivering consistent performance across heterogeneous video sources and tasks. Extensive experiments demonstrate that Q-Router matches or surpasses state-of-the-art VQA models on a variety of benchmarks, while substantially improving generalization and interpretability. Moreover, Q-Router excels on the quality-based question answering benchmark, Q-Bench-Video, highlighting its promise as a foundation for next-generation VQA systems. Finally, we show that Q-Router capably localizes spatiotemporal artifacts, showing potential as a reward function for post-training video generation models.
Abstract（参考訳）: 映像品質評価(VQA)は、人間の判断に従って映像の知覚品質を予測することを目的とした、基本的なコンピュータビジョンタスクである。既存のパフォーマンスVQAモデルは,(1)ユーザ生成コンテンツ(UGC),ショートフォームビデオ(AIGC),(2)限定的な解釈可能性,(3)新規なユースケースやコンテンツタイプへの拡張性の欠如など,さまざまなコンテンツやタスクの一般化に苦慮している。マルチ層モデルルーティングシステムを用いた汎用VQAのためのエージェントフレームワークQ-Routerを提案する。 Q-Routerは様々な専門家モデルを統合し、視覚言語モデル(VLM)を動的に推論し、入力ビデオセマンティクスに基づいて最も適切な専門家をアンサンブルするリアルタイムルータとして採用している。計算予算をベースとした多層ルーティングシステムを構築し,特定時空間アーティファクトのローカライズによる解釈可能性の向上を図った。このエージェント設計により、Q-Routerは専門専門家の相補的な強みを組み合わせ、異種ビデオソースやタスク間で一貫したパフォーマンスを実現するための柔軟性と堅牢性を両立させることができる。広範囲な実験により、Q-Routerは様々なベンチマークで最先端のVQAモデルと一致し、また、一般化と解釈可能性を大幅に改善することを示した。さらに、Q-Routerは品質ベースの質問応答ベンチマークであるQ-Bench-Videoを抜いて、次世代VQAシステムの基盤としての約束を強調している。最後に、Q-Routerは時空間アーティファクトを有意に局所化し、トレーニング後のビデオ生成モデルに対する報酬関数としての可能性を示す。

論文の概要: Q-Router: Agentic Video Quality Assessment with Expert Model Routing and Artifact Localization

関連論文リスト