Fugu-MT 論文翻訳(概要): HMPDM: A Diffusion Model for Driving Video Prediction with Historical Motion Priors

論文の概要: HMPDM: A Diffusion Model for Driving Video Prediction with Historical Motion Priors

arxiv url: http://arxiv.org/abs/2603.27371v1
Date: Sat, 28 Mar 2026 18:37:08 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-31 23:18:44.926185
Title: HMPDM: A Diffusion Model for Driving Video Prediction with Historical Motion Priors
Title（参考訳）: HMPDM: 歴史的動きを優先した映像予測のための拡散モデル
Authors: Ke Li, Tianjia Yang, Kaidi Liang, Xianbiao Hu, Ruwen Qin,
Abstract要約: 本稿では,動きの理解と時間的コヒーレンスを高めるために,過去の動きを利用した映像予測モデルであるHMPDMを紹介する。 Cityscapes と KITTI ベンチマークの大規模な実験により、HMPDM は最先端のビデオ予測手法よりも効率が良いことが示された。
参考スコア（独自算出の注目度）: 8.987844576502054
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Video prediction is a useful function for autonomous driving, enabling intelligent vehicles to reliably anticipate how driving scenes will evolve and thereby supporting reasoning and safer planning. However, existing models are constrained by multi-stage training pipelines and remain insufficient in modeling the diverse motion patterns in real driving scenes, leading to degraded temporal consistency and visual quality. To address these challenges, this paper introduces the historical motion priors-informed diffusion model (HMPDM), a video prediction model that leverages historical motion priors to enhance motion understanding and temporal coherence. The proposed deep learning system introduces three key designs: (i) a Temporal-aware Latent Conditioning (TaLC) module for implicit historical motion injection; (ii) a Motion-aware Pyramid Encoder (MaPE) for multi-scale motion representation; (iii) a Self-Conditioning (SC) strategy for stable iterative denoising. Extensive experiments on the Cityscapes and KITTI benchmarks demonstrate that HMPDM outperforms state-of-the-art video prediction methods with efficiency, achieving a 28.2% improvement in FVD on Cityscapes under the same monocular RGB input configuration setting. The implementation codes are publicly available at https://github.com/KELISBU/HMPDM.
Abstract（参考訳）: ビデオ予測は自動運転に有用な機能であり、インテリジェントな車両は運転シーンの進化を確実に予測し、推論と安全な計画を支援する。しかし、既存のモデルはマルチステージの訓練パイプラインによって制約されており、実際の運転シーンにおける多様な動きパターンをモデル化するには不十分であり、時間的一貫性と視覚的品質が低下する。これらの課題に対処するために,歴史的動き先行情報拡散モデル(HMPDM)を導入し,動きの理解と時間的コヒーレンスを高めるために過去の動き先行情報を利用した映像予測モデルを提案する。提案するディープラーニングシステムには,3つの重要な設計がある。 (i)暗黙の歴史的動作注入のためのTALCモジュール (II)マルチスケール動作表現のための運動対応ピラミッドエンコーダ(MaPE) (三)安定した反復的認知のための自己完結戦略(SC) Cityscapes と KITTI ベンチマークの大規模な実験により、HMPDM は最先端のビデオ予測手法よりも効率が良く、同じモノクロ RGB 入力設定で Cityscapes 上の FVD を 28.2% 改善した。実装コードはhttps://github.com/KELISBU/HMPDMで公開されている。

論文の概要: HMPDM: A Diffusion Model for Driving Video Prediction with Historical Motion Priors

関連論文リスト