Fugu-MT 論文翻訳(概要): HaM-World: Soft-Hamiltonian World Models with Selective Memory for Planning

論文の概要: HaM-World: Soft-Hamiltonian World Models with Selective Memory for Planning

arxiv url: http://arxiv.org/abs/2605.05951v1
Date: Thu, 07 May 2026 09:58:18 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-08 22:27:11.679908
Title: HaM-World: Soft-Hamiltonian World Models with Selective Memory for Planning
Title（参考訳）: HaM-World:Soft-Hamiltonian World Models with Selective Memory for Planning
Authors: Haoyun Tang, Haodong Cui, Keyao Xu, Kun Wang, Zhandong Mei,
Abstract要約: HaM-Worldは、潜在状態を標準部分空間(q, p)と文脈部分空間(c)に分解する構造化世界モデルである。 4つのDeepMind Control Suiteタスクにおいて、HaM-WorldはAUC (117.9, +9.5%) の最高値に達し、長時間のロールアウトエラーを強いベースラインモデルの45%に減らし、3,5,7 MSEセルで11/12 kを獲得した。
参考スコア（独自算出の注目度）: 2.28917458387011
License: http://creativecommons.org/licenses/by/4.0/
Abstract: World models enable model-based planning through learned latent dynamics, but imagined rollouts become unstable as the planning horizon grows or the dynamics distribution shifts. We argue that this instability reflects two missing structures in planner-facing latents: history-conditioned memory for approximate Markov completeness, and geometric organization that separates configuration, momentum, and task semantics. We propose HaM-World (HMW), a structured world model that decomposes the latent state into a canonical (q, p) subspace and a context subspace c, while using Mamba selective state-space memory as the history-conditioned input to the same latent dynamics. Within this interface, (q, p) evolves through an energy-derived Hamiltonian vector field plus learnable residual/control dynamics, while c captures semantic, dissipative, and non-conservative factors. This gives the planner a single latent state shared by dynamics prediction, reward/value estimation, imagined rollouts, and CEM action search. On four DeepMind Control Suite tasks, HaM-World reaches the highest Avg. AUC (117.9, +9.5%), reduces long-horizon rollout error to 45% of a strong baseline model, and wins 11/12 k in {3,5,7} MSE cells. Under 12 OOD perturbations spanning dynamics shifts, action delay, and observation masking, HaM-World achieves the highest return in every condition, with average OOD-return gains of 10.2% on Finger Spin and 13.6% on Reacher Easy. Mechanism diagnostics further show bounded action-free Hamiltonian-energy drift, structured energy variation under policy rollouts, and coherent control-induced energy transfer, supporting the intended Soft-Hamiltonian dynamics design.
Abstract（参考訳）: 世界モデルは、学習された潜在力学を通してモデルベースの計画を可能にするが、計画の地平線が大きくなるか、ダイナミックス分布がシフトするにつれて、ロールアウトが不安定になることを想像する。この不安定性は、マルコフ完全性を近似する履歴条件記憶と、構成、運動量、タスク意味論を分離する幾何学的構造という、プランナーに面した潜伏状態の2つの欠落構造を反映している、と我々は主張する。本研究では,HMW(Hamba electure state-space memory)を履歴条件入力として使用しながら,潜在状態を標準部分空間(q, p)とコンテキスト部分空間(c)に分解する構造化世界モデルを提案する。このインターフェースの中では、(q, p) はエネルギー由来のハミルトンベクトル場と学習可能な残留/制御力学を通して進化し、c は意味論的、散逸的、非保存的因子を捉えている。これにより、プランナーは、動的予測、報酬/価値推定、想像されたロールアウト、CEMアクションサーチによって共有される単一の潜在状態を与える。 4つのDeepMind Control Suiteタスクにおいて、HaM-Worldは最高のAvgに達した。 AUC (117.9, +9.5%) は、長い水平ロールアウト誤差を強いベースラインモデルの45%に減らし、{3,5,7} MSE細胞で11/12kで勝利する。ダイナミックスシフト、アクション遅延、観察マスキングにまたがる12のOOD摂動の下で、HaM-Worldはすべての条件において最も高いリターンを達成し、平均的なOOD-リターンはフィンガースピンで10.2%、リーチャー・イージーで13.6%向上した。メカニズム診断により、フリーなハミルトンエネルギードリフト、ポリシーのロールアウトによる構造的エネルギー変動、コヒーレント制御によるエネルギー伝達がさらに示され、Soft-Hamiltonian dynamics設計が支持された。

論文の概要: HaM-World: Soft-Hamiltonian World Models with Selective Memory for Planning

関連論文リスト