Fugu-MT 論文翻訳(概要): Pioneering Perceptual Video Fluency Assessment: A Novel Task with Benchmark Dataset and Baseline

論文の概要: Pioneering Perceptual Video Fluency Assessment: A Novel Task with Benchmark Dataset and Baseline

arxiv url: http://arxiv.org/abs/2603.26055v1
Date: Fri, 27 Mar 2026 03:47:41 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-30 21:49:48.348611
Title: Pioneering Perceptual Video Fluency Assessment: A Novel Task with Benchmark Dataset and Baseline
Title（参考訳）: 知覚的ビデオ周波数評価のパイオニア化:ベンチマークデータセットとベースラインを用いた新しいタスク
Authors: Qizhi Xie, Kun Yuan, Yunpeng Qu, Ming Sun, Chao Zhou, Jihong Zhu,
Abstract要約: 初回評価基準と人間によるビデオ流速評価(VFA)を特徴とする流速指向型データセットであるFluVidを開発した。本稿では,FluNetと呼ばれるベースラインモデルを提案する。このモデルでは,時間的パーミューテッドな自己アテンションをデプロイし,入力流速情報を強化し,長距離フレーム間相互作用を強化する。私たちの仕事は最先端のパフォーマンスを実現し、コミュニティにVFAのソリューションを探求するためのロードマップを提供します。
参考スコア（独自算出の注目度）: 12.41142742925495
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Accurately estimating humans' subjective feedback on video fluency, e.g., motion consistency and frame continuity, is crucial for various applications like streaming and gaming. Yet, it has long been overlooked, as prior arts have focused on solving it in the video quality assessment (VQA) task, merely as a sub-dimension of overall quality. In this work, we conduct pilot experiments and reveal that current VQA predictions largely underrepresent fluency, thereby limiting their applicability. To this end, we pioneer Video Fluency Assessment (VFA) as a standalone perceptual task focused on the temporal dimension. To advance VFA research, 1) we construct a fluency-oriented dataset, FluVid, comprising 4,606 in-the-wild videos with balanced fluency distribution, featuring the first-ever scoring criteria and human study for VFA. 2) We develop a large-scale benchmark of 23 methods, the most comprehensive one thus far on FluVid, gathering insights for VFA-tailored model designs. 3) We propose a baseline model called FluNet, which deploys temporal permuted self-attention (T-PSA) to enrich input fluency information and enhance long-range inter-frame interactions. Our work not only achieves state-of-the-art performance but, more importantly, offers the community a roadmap to explore solutions for VFA.
Abstract（参考訳）: 動画の流速に対する人間の主観的なフィードバックを正確に見積もるのは、例えば、動きの一貫性とフレームの連続性であり、ストリーミングやゲームといった様々なアプリケーションにとって重要である。しかし、従来の芸術はビデオ品質評価(VQA)タスクにおいて、全体的な品質のサブディメンジョンとして解決することに重点を置いてきたため、長い間見過ごされてきた。本研究は,パイロット実験を行い,現在のVQA予測は流速がほとんど低く,適用性に制限があることを明らかにする。この目的のために、時間次元に着目したスタンドアロンの知覚タスクとして、VFA(Video Fluency Assessment)を開拓した。 VFA研究を推進。 1) フルーエンシ指向のデータセットであるFluVidを構築し,VFAのための評価基準と人間による研究を特徴とする,バランスの取れたフルーエンシ分布の動画4,606枚を収録した。 2) これまでにFluVid上で最も包括的な23の手法を大規模に評価し, VFAモデル設計の知見を収集する。 3)FluNetと呼ばれるベースラインモデルを提案する。このモデルでは,T-PSA(temporal permuted self-attention)をデプロイし,入力流速情報を強化し,長距離フレーム間相互作用を強化する。私たちの仕事は、最先端のパフォーマンスを達成するだけでなく、より重要なのは、コミュニティにVFAのソリューションを探求するためのロードマップを提供しています。

論文の概要: Pioneering Perceptual Video Fluency Assessment: A Novel Task with Benchmark Dataset and Baseline

関連論文リスト