Fugu-MT 論文翻訳(概要): Temporal Realism Evaluation of Generated Videos Using Compressed-Domain Motion Vectors

論文の概要: Temporal Realism Evaluation of Generated Videos Using Compressed-Domain Motion Vectors

arxiv url: http://arxiv.org/abs/2511.13897v1
Date: Mon, 17 Nov 2025 20:47:06 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-19 16:23:52.789485
Title: Temporal Realism Evaluation of Generated Videos Using Compressed-Domain Motion Vectors
Title（参考訳）: 圧縮領域運動ベクトルを用いた映像の時間的リアリズム評価
Authors: Mert Onur Cakiroglu, Idil Bilge Altun, Zhihe Lu, Mehmet Dalkilic, Hasan Kurban,
Abstract要約: 圧縮ビデオストリームから直接抽出した動きベクトル(MV)を用いて時間的行動を評価するスケーラブルなモデルAフレームワークを提案する。 Kullback-Leibler, Jensen-Shannon, Wassersteinの2つの相違点を実ビデオと生成ビデオのMV統計量で計算することで現実性を定量化する。
参考スコア（独自算出の注目度）: 8.077437139445603
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Temporal realism remains a central weakness of current generative video models, as most evaluation metrics prioritize spatial appearance and offer limited sensitivity to motion. We introduce a scalable, model-agnostic framework that assesses temporal behavior using motion vectors (MVs) extracted directly from compressed video streams. Codec-generated MVs from standards such as H.264 and HEVC provide lightweight, resolution-consistent descriptors of motion dynamics. We quantify realism by computing Kullback-Leibler, Jensen-Shannon, and Wasserstein divergences between MV statistics of real and generated videos. Experiments on the GenVidBench dataset containing videos from eight state-of-the-art generators reveal systematic discrepancies from real motion: entropy-based divergences rank Pika and SVD as closest to real videos, MV-sum statistics favor VC2 and Text2Video-Zero, and CogVideo shows the largest deviations across both measures. Visualizations of MV fields and class-conditional motion heatmaps further reveal center bias, sparse and piecewise constant flows, and grid-like artifacts that frame-level metrics do not capture. Beyond evaluation, we investigate MV-RGB fusion through channel concatenation, cross-attention, joint embedding, and a motion-aware fusion module. Incorporating MVs improves downstream classification across ResNet, I3D, and TSN backbones, with ResNet-18 and ResNet-34 reaching up to 97.4% accuracy and I3D achieving 99.0% accuracy on real-versus-generated discrimination. These findings demonstrate that compressed-domain MVs provide an effective temporal signal for diagnosing motion defects in generative videos and for strengthening temporal reasoning in discriminative models. The implementation is available at: https://github.com/KurbanIntelligenceLab/Motion-Vector-Learning
Abstract（参考訳）: 時間的リアリズムは、ほとんどの評価指標が空間的外観を優先し、動きに対する感度が制限されるため、現在の生成ビデオモデルの中心的な弱点である。本稿では,圧縮ビデオストリームから直接抽出した動きベクトル(MV)を用いて時間的行動を評価する,スケーラブルでモデルに依存しないフレームワークを提案する。 H.264やHEVCのような標準からのコーデック生成MVは、動き力学の軽量で解像度に一貫性のある記述子を提供する。 Kullback-Leibler, Jensen-Shannon, Wassersteinの2つの相違点を実ビデオと生成ビデオのMV統計量で計算することで現実性を定量化する。最先端の8つのジェネレータのビデオを含むGenVidBenchデータセットの実験では、実際の動画に最も近いエントロピーベースの発散率PikaとSVD、MV-sum統計はVC2とText2Video-Zero、CagVideoは両方の測度で最大の偏差を示している。 MVフィールドとクラス条件の運動熱マップの可視化により、中心バイアス、スパースと断片的な定数フロー、およびフレームレベルのメトリクスが捉えないグリッドのようなアーティファクトがさらに明らかになる。さらに, チャネル結合, クロスアテンション, 関節埋め込み, 移動認識融合モジュールによるMV-RGB核融合について検討した。 ResNet-18とResNet-34は97.4%、I3Dは99.0%まで精度が向上した。これらの結果から, 圧縮領域MVは, 生成ビデオの動作異常の診断や, 識別モデルにおける時間的推論の強化に有効な時間的信号を提供することが示された。実装は以下の通りである。 https://github.com/KurbanIntelligenceLab/Motion-Vector-Learning

関連論文リスト

Physics-Driven Spatiotemporal Modeling for AI-Generated Video Detection [73.51855469884195]
本稿では,確率流の保存原理に基づくAI駆動型ビデオ検出パラダイムを提案する。本研究では,テストのNSG特徴と実ビデオとの間の平均離散性(MMD)を計算するNSG-VD(NSG-VD)を開発した。
論文参考訳（メタデータ） (2025-10-09T11:00:35Z)
Trajectory-aware Shifted State Space Models for Online Video Super-Resolution [57.87099307245989]
本稿では、トラジェクトリ対応シフトSSM(TS-Mamba)に基づく新しいオンラインVSR手法を提案する。 TS-Mambaは、最初にビデオ内の軌跡を構築し、以前のフレームから最もよく似たトークンを選択する。私たちのTS-Mambaは、ほとんどのケースで最先端のパフォーマンスを実現し、22.7%以上の削減複雑性(MAC)を実現しています。
論文参考訳（メタデータ） (2025-08-14T08:42:15Z)
VideoMolmo: Spatio-Temporal Grounding Meets Pointing [66.19964563104385]
VideoMolmoは、ビデオシーケンスのきめ細かいポインティングに適したモデルだ。新しい仮面融合はSAM2を双方向の点伝播に用いている。 The generalization of VideoMolmo, we introduced VPoMolS-temporal, a challenge out-of-distribution benchmark across two real-world scenarios。
論文参考訳（メタデータ） (2025-06-05T17:59:29Z)
Motion-Aware Concept Alignment for Consistent Video Editing [57.08108545219043]
MoCA-Video (Motion-Aware Concept Alignment in Video) は、画像ドメインのセマンティックミキシングとビデオのギャップを埋めるトレーニング不要のフレームワークである。生成されたビデオとユーザが提供した参照画像が与えられた後、MoCA-Videoは参照画像のセマンティックな特徴をビデオ内の特定のオブジェクトに注入する。我々は、標準SSIM、画像レベルLPIPS、時間LPIPSを用いてMoCAの性能を評価し、新しいメトリクスCASS(Conceptual Alignment Shift Score)を導入し、ソースプロンプトと修正ビデオフレーム間の視覚的シフトの一貫性と有効性を評価する。
論文参考訳（メタデータ） (2025-06-01T13:28:04Z)
Temporal-Consistent Video Restoration with Pre-trained Diffusion Models [51.47188802535954]
ビデオ復元(VR)は、劣化したビデオから高品質なビデオを復元することを目的としている。事前訓練拡散モデル(DM)を用いた最近のゼロショットVR法は,逆拡散時の近似誤差と時間的整合性の欠如に悩まされている。本稿では,DMのシード空間におけるビデオフレームを直接パラメータ化し,近似誤差を排除した新しいMAP(Posterior Maximum)フレームワークを提案する。
論文参考訳（メタデータ） (2025-03-19T03:41:56Z)
Uniformly Accelerated Motion Model for Inter Prediction [38.34487653360328]
自然ビデオでは、通常、変動速度を持つ複数の移動物体が存在し、その結果、コンパクトに表現することが難しい複雑な運動場が生じる。 Versatile Video Coding (VVC) では、既存のインター予測手法は連続するフレーム間の均一な速度運動を仮定する。本研究では,動画フレーム間の移動物体の運動関連要素(速度,加速度)を利用する一様加速度運動モデル(UAMM)を提案する。
論文参考訳（メタデータ） (2024-07-16T09:46:29Z)
Multiscale Motion-Aware and Spatial-Temporal-Channel Contextual Coding Network for Learned Video Compression [24.228981098990726]
動画圧縮ネットワーク(MASTC-VC)を提案する。提案するMASTC-VCは,3つの公開ベンチマークデータセット上での従来の最先端(SOTA)手法よりも優れている。提案手法は,PSNRのH.265/HEVC(HM-16.20)に対して平均10.15%のBDレートを,MS-SSIMのH.266/VVC(VTM-13.2)に対して平均23.93%のBDレートを節約する。
論文参考訳（メタデータ） (2023-10-19T13:32:38Z)
Neural Video Coding using Multiscale Motion Compensation and Spatiotemporal Context Model [45.46660511313426]
エンド・ツー・エンドのディープ・ニューラル・ビデオ・コーディング・フレームワーク(NVC)を提案する。フレーム内画素、フレーム間運動、フレーム間補償残差の相関を利用するために、共同空間および時間的事前集約(PA)を備えた可変オートエンコーダ(VAE)を使用する。 NVCは低遅延因果条件で評価され、H.265/HEVC、H.264/AVC、その他の学習ビデオ圧縮法と比較される。
論文参考訳（メタデータ） (2020-07-09T06:15:17Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。