Fugu-MT 論文翻訳(概要): Token Is All You Need: Cognitive Planning through Belief-Intent Co-Evolution

論文の概要: Token Is All You Need: Cognitive Planning through Belief-Intent Co-Evolution

arxiv url: http://arxiv.org/abs/2511.05540v2
Date: Tue, 11 Nov 2025 18:17:53 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-16 06:38:31.034072
Title: Token Is All You Need: Cognitive Planning through Belief-Intent Co-Evolution
Title（参考訳）: Tokenは必要なものすべて: 信頼と信頼の共進化による認知的計画
Authors: Shiyao Sang,
Abstract要約: 意味的に豊かなトークンの最小セット内での信念と意図の共進化から効果的な計画が生じることを示す。知性はピクセルの忠実さではなく、信念と意図のトークン化された双対性にある。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We challenge the long-standing assumption that exhaustive scene modeling is required for high-performance end-to-end autonomous driving (E2EAD). Inspired by cognitive science, we propose that effective planning arises not from reconstructing the world, but from the co-evolution of belief and intent within a minimal set of semantically rich tokens. Experiments on the nuPlan benchmark (720 scenarios, 11k+ samples) reveal three principles: (1) sparse intent tokens alone achieve 0.487 m ADE, demonstrating strong performance without future prediction; (2) conditioning trajectory decoding on predicted future tokens reduces ADE to 0.382 m, a 21.6% improvement, showing that performance emerges from cognitive planning; and (3) explicit reconstruction loss degrades performance, confirming that task-driven belief-intent co-evolution suffices under reliable perception inputs. Crucially, we observe the emergence of cognitive consistency: through prolonged training, the model spontaneously develops stable token dynamics that balance current perception (belief) and future goals (intent). This process, accompanied by "temporal fuzziness," enables robustness under uncertainty and continuous self-optimization. Our work establishes a new paradigm: intelligence lies not in pixel fidelity, but in the tokenized duality of belief and intent. By reframing planning as understanding rather than reaction, TIWM bridges the gap between world models and VLA systems, paving the way for foresightful agents that plan through imagination. Note: Numerical comparisons with methods reporting results on nuScenes are indicative only, as nuPlan presents a more challenging planning-focused evaluation.
Abstract（参考訳）: 我々は、高性能エンドツーエンド自動運転(E2EAD)において、徹底的なシーンモデリングが必要とされるという長年の前提に挑戦する。認知科学にインスパイアされた我々は、効果的な計画は世界を再構築することではなく、最小限の意味論的に豊かなトークンの集合の中で、信念と意図の共進化から生まれることを提案する。 nuPlanベンチマーク (720のシナリオ、11k以上のサンプル) の実験では、3つの原則が示されている: 1) 少ない意図トークンだけで0.487m ADEを達成し、将来の予測なしに強力なパフォーマンスを示し、(2) 予測される将来のトークンに対する条件付き軌道デコーディングはADEを0.382mに減少させ、21.6%の改善により、認知計画からパフォーマンスが出現することを示し、(3) 明示的な再構築損失はパフォーマンスを低下させ、タスク駆動の信念に基づく共進化が信頼性のあるインプットの下で十分であることを確認した。重要なことに、認知的一貫性の出現を観察する: 長期間のトレーニングを通じて、モデルは、現在の知覚(信念)と将来の目標(意図)のバランスをとる安定したトークンダイナミクスを自発的に発達させる。このプロセスには「時間的曖昧さ」が伴い、不確実性と継続的な自己最適化の下で堅牢性を実現する。知性はピクセルの忠実さではなく、信念と意図のトークン化された双対性にある。 TIWMは、世界モデルとVLAシステムのギャップを埋め、想像力を通じて計画する監視エージェントの道を開く。注: nuPlanは計画に焦点を絞ったより困難な評価を示すため、nuScenesの結果を報告する方法と数値的な比較が示されるのみである。

論文の概要: Token Is All You Need: Cognitive Planning through Belief-Intent Co-Evolution

関連論文リスト