Fugu-MT 論文翻訳(概要): Beyond Euclidean Proximity: Repairing Latent World Models with Horizon-Matched Trajectory Reachability Metrics

論文の概要: Beyond Euclidean Proximity: Repairing Latent World Models with Horizon-Matched Trajectory Reachability Metrics

arxiv url: http://arxiv.org/abs/2605.22164v1
Date: Thu, 21 May 2026 08:34:57 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-22 16:35:42.165483
Title: Beyond Euclidean Proximity: Repairing Latent World Models with Horizon-Matched Trajectory Reachability Metrics
Title（参考訳）: ユークリッド確率の超越:水平整形軌道到達距離による潜在世界モデルの修復
Authors: Liangyu Li, Shengzhi Wang, Qingwen Liu,
Abstract要約: 一般的な潜時MPCでは、候補列は予測終端状態と目標潜時状態の間のユークリッド距離によってランク付けされる。固定潜在世界モデルのためのポストホック終端階法であるトラジェクトリ・リーチビリティ・メトリクス(TRM)を提案する。
参考スコア（独自算出の注目度）: 5.384648499307027
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Latent world models can contain the state needed for control, yet their terminal-cost interface can expose the planner to the wrong decision-relevant information. In common latent MPC, candidate sequences are ranked by Euclidean distance between predicted terminal and goal latent states; this assumes that raw latent distance weights reachability-relevant variables correctly. We propose trajectory reachability metrics (TRM), a post-hoc terminal-ranking method for fixed latent world models. TRM trains a small pairwise head from logged trajectory structure and uses it as a replacement or hybrid cost; the encoder, dynamics, sampler, optimizer, and evaluation manifests remain fixed. The key design choice is horizon-aware supervision: the metric is trained on broad, balanced temporal separations to match the long-horizon terminal candidate ranking problem. On a hard TwoRoom benchmark, raw latent planning with LeWorldModel (LeWM) reaches 7.0% success, while full-horizon TRM reaches 97.0%; shuffled temporal-label controls stay at 0.0%. The same recipe improves a PLDM baseline from 32.7% to 84.0% across three seeds, and a short-horizon TRM variant reaches only 35.0% with the 100,000 pair budget. In TwoRoom, we provide mechanistic evidence for why TRM works: XY position is linearly decodable (R^2=0.998), yet raw latent MSE misranks candidates; the XY-probe rowspace accounts for less than 1% of terminal-goal latent MSE but carries most candidate-quality signal; and SCSA audits show that TRM improves the ordering and selected endpoint seen by the planner. On PushT go50/go75, TRM-style task-state metrics improve SCSA ranking and selected final distance more cleanly than closed-loop success, motivating auxiliary hybrid costs in continuous manipulation. TRM is the planner-facing repair, and audits explain when terminal reachability metrics should replace or augment raw latent proximity.
Abstract（参考訳）: 潜在世界モデルは制御に必要な状態を含むことができるが、端末コストのインターフェースはプランナーを間違った決定関連情報に公開することができる。一般的な潜伏MPCでは、候補列は予測終端と目標潜伏状態の間のユークリッド距離でランク付けされる。固定潜在世界モデルのためのポストホック終端階法であるトラジェクトリ・リーチビリティ・メトリクス(TRM)を提案する。 TRMは、ログ化された軌道構造から小さな対の頭部を訓練し、それを代替またはハイブリッドコストとして使用し、エンコーダ、ダイナミクス、サンプリング器、オプティマイザ、評価マニフェストは固定されている。測度は、長い水平端末候補ランキング問題に適合するように、広範かつバランスの取れた時間的分離に基づいて訓練される。ハードな TwoRoom ベンチマークでは、LeWorldModel (LeWM) による生の潜伏計画が7.0%、フルホライゾン TRM は97.0%、シャッフル時間ラベルコントロールは0.0%である。同じレシピはPLDMベースラインを3つの種で32.7%から84.0%に改善し、短水平のTRM変種は10万対の予算でわずか35.0%に達する。 TwoRoomでは、TRMが機能する理由として、XY位置が線形デオード可能(R^2=0.998)であるが、生の潜伏MSEが候補を誤っていること、XY-probe行空間が端末ゴール潜伏MSEの1%未満を占めるが、ほとんどの候補品質信号を持っていること、そして、SCSA監査では、TRMがプランナーが見た順序と選択されたエンドポイントを改善することを示す。 PushT go50/go75では、TRMスタイルのタスクステートメトリクスがSCSAランキングを改善し、クローズドループの成功よりもクリーンに最終距離を選択し、継続的な操作において補助的なハイブリッドコストを動機付けている。 TRMは、プランナーが直面する修理であり、監査は、いつ端末の到達可能性の指標が、生の潜伏した近接を置き換えるか、または拡張すべきかを説明する。

論文の概要: Beyond Euclidean Proximity: Repairing Latent World Models with Horizon-Matched Trajectory Reachability Metrics

関連論文リスト