Fugu-MT 論文翻訳(概要): Causal Motion Diffusion Models for Autoregressive Motion Generation

論文の概要: Causal Motion Diffusion Models for Autoregressive Motion Generation

arxiv url: http://arxiv.org/abs/2602.22594v1
Date: Thu, 26 Feb 2026 03:58:25 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-27 18:41:22.517635
Title: Causal Motion Diffusion Models for Autoregressive Motion Generation
Title（参考訳）: 自己回帰運動生成のための因果運動拡散モデル
Authors: Qing Yu, Akihisa Watanabe, Kent Fujiwara,
Abstract要約: 因果運動拡散モデル(CMDM)は自己回帰運動生成のための統合されたフレームワークである。 CMDMはMAC-VAE(Motion-Language-Aligned Causal VAE)の上に構築され、動作シーケンスを時間的因果潜在表現にエンコードする。 HumanML3DとSnapMoGenの実験では、CMDMは、意味的忠実度と時間的滑らかさの両方において、既存の拡散モデルと自己回帰モデルより優れていることを示した。
参考スコア（独自算出の注目度）: 19.61051102039212
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in motion diffusion models have substantially improved the realism of human motion synthesis. However, existing approaches either rely on full-sequence diffusion models with bidirectional generation, which limits temporal causality and real-time applicability, or autoregressive models that suffer from instability and cumulative errors. In this work, we present Causal Motion Diffusion Models (CMDM), a unified framework for autoregressive motion generation based on a causal diffusion transformer that operates in a semantically aligned latent space. CMDM builds upon a Motion-Language-Aligned Causal VAE (MAC-VAE), which encodes motion sequences into temporally causal latent representations. On top of this latent representation, an autoregressive diffusion transformer is trained using causal diffusion forcing to perform temporally ordered denoising across motion frames. To achieve fast inference, we introduce a frame-wise sampling schedule with causal uncertainty, where each subsequent frame is predicted from partially denoised previous frames. The resulting framework supports high-quality text-to-motion generation, streaming synthesis, and long-horizon motion generation at interactive rates. Experiments on HumanML3D and SnapMoGen demonstrate that CMDM outperforms existing diffusion and autoregressive models in both semantic fidelity and temporal smoothness, while substantially reducing inference latency.
Abstract（参考訳）: 動き拡散モデルの最近の進歩は、人間の動き合成の現実性を大幅に改善した。しかし、既存のアプローチは、時間的因果性とリアルタイム適用性を制限する双方向生成を伴う完全系列拡散モデルに依存するか、不安定性と累積誤差に悩む自己回帰モデルに依存する。本研究では,因果拡散変換器をベースとした自動回帰運動生成のための統合フレームワークである因果運動拡散モデル(CMDM)を提案する。 CMDMはMAC-VAE(Motion-Language-Aligned Causal VAE)の上に構築され、動作シーケンスを時間的因果潜在表現にエンコードする。この潜在表現の上に、自己回帰拡散変換器を因果拡散強制を用いて訓練し、運動フレーム間で時間的に順序づけられた復調を行う。高速な推論を実現するために,因果不確実性を考慮したフレームワイズサンプリングスケジュールを導入する。結果として得られるフレームワークは、高品質なテキスト・ツー・モーション生成、ストリーミング合成、対話的なレートでの長距離モーション生成をサポートする。 HumanML3DとSnapMoGenの実験では、CMDMは既存の拡散モデルや自己回帰モデルよりもセマンティックフィディリティと時間的滑らかさの両方で優れており、推論遅延を大幅に低減している。

関連論文リスト

FlowMo: Variance-Based Flow Guidance for Coherent Motion in Video Generation [51.110607281391154]
FlowMoは、テキスト・ビデオ・モデルにおける動きコヒーレンスを高めるためのトレーニング不要のガイダンス手法である。時間次元のパッチワイドな分散を測定して動きのコヒーレンスを推定し、サンプリング中にこの分散を動的に減少させるためにモデルを導く。
論文参考訳（メタデータ） (2025-06-01T19:55:33Z)
Generative Pre-trained Autoregressive Diffusion Transformer [74.25668109048418]
GPDiT(GPDiT)は、自動回帰拡散変換器である。長距離ビデオ合成における拡散と自己回帰モデリングの強みを統一する。拡散損失を用いて将来の潜伏フレームを自動回帰予測し、運動力学の自然なモデリングを可能にする。
論文参考訳（メタデータ） (2025-05-12T08:32:39Z)
MotionStreamer: Streaming Motion Generation via Diffusion-based Autoregressive Model in Causal Latent Space [40.60429652169086]
テキスト条件付きストリーミングモーション生成では、可変長の歴史的動きと入ってくるテキストに基づいて、次のステップの人間のポーズを予測する必要がある。既存の方法は、例えば拡散モデルが予め定義された動き長によって制約されるような、ストリーミングモーション生成を達成するのに苦労している。本研究では,連続因果遅延空間を確率論的自己回帰モデルに組み込む新しいフレームワークであるMotionStreamerを提案する。
論文参考訳（メタデータ） (2025-03-19T17:32:24Z)
FRMD: Fast Robot Motion Diffusion with Consistency-Distilled Movement Primitives for Smooth Action Generation [3.7351623987275873]
本研究では,スムーズかつ時間的に一貫したロボットの動きを生成するための高速ロボット運動拡散法を提案する。本手法は,移動プリミティブ(MP)と一貫性モデルを統合し,効率的な単一ステップ軌道生成を実現する。その結果,FRMDはより高速でスムーズな軌道を発生し,高い成功率を達成できた。
論文参考訳（メタデータ） (2025-03-03T20:56:39Z)
Dynamical Diffusion: Learning Temporal Dynamics with Diffusion Models [71.63194926457119]
動的拡散(DyDiff, Dynamical Diffusion)は, 時間的に意識された前と逆のプロセスを含む理論的に健全なフレームワークである。科学的時間的予測、ビデオ予測、時系列予測に関する実験は、動的拡散が時間的予測タスクのパフォーマンスを一貫して改善することを示した。
論文参考訳（メタデータ） (2025-03-02T16:10:32Z)
RecMoDiffuse: Recurrent Flow Diffusion for Human Motion Generation [5.535590461577558]
RecMoDiffuseは時間モデリングのための新しい再帰拡散定式化である。人間の動作の時間的モデリングにおけるRecMoDiffuseの有効性を実証する。
論文参考訳（メタデータ） (2024-06-11T11:25:37Z)
Motion-aware Latent Diffusion Models for Video Frame Interpolation [51.78737270917301]
隣接するフレーム間の動き推定は、動きのあいまいさを避ける上で重要な役割を担っている。我々は、新しい拡散フレームワーク、動き認識潜在拡散モデル(MADiff)を提案する。提案手法は,既存手法を著しく上回る最先端性能を実現する。
論文参考訳（メタデータ） (2024-04-21T05:09:56Z)
EMDM: Efficient Motion Diffusion Model for Fast and High-Quality Motion Generation [57.539634387672656]
現在の最先端生成拡散モデルでは、優れた結果が得られたが、品質を犠牲にすることなく、高速な生成に苦慮している。高速かつ高品質な人体運動生成のための効率的な運動拡散モデル(EMDM)を提案する。
論文参考訳（メタデータ） (2023-12-04T18:58:38Z)
Modiff: Action-Conditioned 3D Motion Generation with Denoising Diffusion Probabilistic Models [58.357180353368896]
本稿では,現実的で多様な3D骨格に基づく運動生成問題に対処するために,拡散確率モデル(DDPM)の利点を生かした条件付きパラダイムを提案する。我々はDDPMを用いてカテゴリ的動作で条件付けられた動作列の可変数を合成する先駆的な試みである。
論文参考訳（メタデータ） (2023-01-10T13:15:42Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。