Fugu-MT 論文翻訳(概要): Mind Dreamer: Untethering Imagination via Active Latent Intervention on Latent Manifolds

論文の概要: Mind Dreamer: Untethering Imagination via Active Latent Intervention on Latent Manifolds

arxiv url: http://arxiv.org/abs/2605.16030v1
Date: Fri, 15 May 2026 15:05:58 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-18 17:44:16.343503
Title: Mind Dreamer: Untethering Imagination via Active Latent Intervention on Latent Manifolds
Title（参考訳）: マインドドリーマー:潜伏するマニフォールドのアクティブインターベンションでイマジネーションを解き放つ
Authors: Shaojun Xu, Xiaoling Zhou, Yihan Lin, Yapeng Meng, Xinglong Ji, Luping Shi, Rong Zhao,
Abstract要約: 我々は,マルコフ連続性を超越するために,能動的潜在介入(ALI)を運用するフレームワークであるマインドドリーマー(MD)を提案する。 MDはグローバルリレーマニフォールド自由エネルギー(R-EFE)の最小化として発見を再考する我々はMDがDeepMind Control Suite上でDreamerV3よりも平均1.67$times$のスピードアップを実現し、スパース・リワードタスクで8.8$times$に達することを示した。
参考スコア（独自算出の注目度）: 13.96435440318736
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Model-Based Reinforcement Learning (MBRL) leverages latent imagination for sample efficiency, yet remains constrained by Historical Tethering: imagination is typically initialized from observed states. This creates a learning asymmetry, where the world model's manifold discovery outpaces the policy's sparse-reward optimization. We propose Mind Dreamer (MD), a framework that operationalizes Active Latent Intervention (ALI) to transcend Markovian continuity. MD reformulates discovery as the minimization of a global Relay Manifold Expected Free Energy (R-EFE); by sampling initial states from a learned generator $s_0 \sim p_{gen}(\cdot)$ rather than the historical buffer, MD utilizes an adversarial generator to synthesize non-continuous latent jumps to epistemic blind spots that are physically plausible yet cognitively challenging. To resolve the credit assignment paradox across these spatial ruptures, we derive the Relay Value Function (RVF) and Relay Uncertainty Function (RUF). These potentials treat synthesized anchors as counterfactual intermediary states, propagating pragmatic and epistemic value through a principled Bellman-style formulation. Notably, we prove that uncertainty propagation across discontinuities necessitates a quadratic discount $γ^2$, establishing a formal epistemic horizon. Theoretically, MD approximates a variance-minimizing importance sampler that expands the manifold's spectral gap, reducing the hitting time to critical bottleneck states. Empirically, MD achieves a 1.67$\times$ average speedup over DreamerV3 on DeepMind Control Suite, reaching 8.8$\times$ in sparse-reward tasks.
Abstract（参考訳）: モデルベース強化学習(MBRL)は、標本効率に潜伏した想像力を利用するが、歴史的テザリングによって制約される: 想像力は通常観察された状態から初期化される。これは、世界モデルの多様体発見が政策のスパース・リワード最適化を上回る学習非対称性を生み出す。我々は,マルコフ連続性を超越するために,能動的潜在介入(ALI)を運用するフレームワークであるマインドドリーマー(MD)を提案する。 MDは、グローバルなリレーマニフォールド期待自由エネルギー(R-EFE)の最小化として発見を再構成し、学習した生成元から初期状態をサンプリングすることで、過去のバッファではなく、敵対的生成元を使用して非連続的な潜伏ジャンプを、物理的に証明可能で認知的に困難であるてんかん盲点へ合成する。これらの空間的破壊に対する信用割当パラドックスを解決するために、リレー値関数(RVF)とリレー不確実性関数(RUF)を導出する。これらのポテンシャルは合成アンカーを反現実的中間状態として扱い、ベルマン様式の定式化を通じて実用的および疫学的な価値を伝播する。特に,不連続性を越えた不確実性伝播は2次割引$γ^2$を必要とし,公式なてんかんの地平線を確立する。理論的には、MDは、多様体のスペクトルギャップを拡大し、臨界ボトルネック状態への打撃時間を短縮する分散最小化重要サンプリング器を近似する。 MDはDeepMind Control SuiteでDreamerV3よりも1.67$\times$の平均的なスピードアップを実現し、スパース・リワードタスクで8.8$\times$に達した。

論文の概要: Mind Dreamer: Untethering Imagination via Active Latent Intervention on Latent Manifolds

関連論文リスト