Fugu-MT 論文翻訳(概要): DIRECT: When and Where Should You Allocate Test-Time Compute in Embodied Planners?

論文の概要: DIRECT: When and Where Should You Allocate Test-Time Compute in Embodied Planners?

arxiv url: http://arxiv.org/abs/2606.12402v1
Date: Wed, 10 Jun 2026 17:58:49 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-11 16:42:38.614888
Title: DIRECT: When and Where Should You Allocate Test-Time Compute in Embodied Planners?
Title（参考訳）: DIRECT:テスト時間計算はいつどこで行うべきか?
Authors: Jadelynn Dao, Milan Ganai, Yasmina Abukhadra, Ajay Sridhar, Mozhgan Nasr Azadani, Katie Luo, Clark Barrett, Jiajun Wu, Chelsea Finn, Marco Pavone,
Abstract要約: VLM(Vision-Language Models)は、エンボディエージェントの高レベルプランナーとしてますます普及している。テストタイムの計算をいつ、どこで使うかを選択することは、実際の世界にフロンティアパフォーマンスをもたらす中心である、と私たちは主張する。我々はマルチモーダルシーンコンテキストを用いてプロンプト毎に計算を割り当てるルーティングフレームワークであるDIRECTを紹介した。
参考スコア（独自算出の注目度）: 57.585275546688116
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Vision-Language Models (VLMs) are increasingly deployed as high-level planners for embodied agents, with an emerging strategy of scaling test-time compute to improve capability. However, we observe that doing so increases latency, token usage, and FLOPs while yielding uneven, often diminishing gains in downstream success, limiting where embodied agents can be deployed. We argue that choosing when and where to spend test-time compute is central to bringing frontier performance to the real world. We introduce DIRECT, a routing framework that uses multimodal scene context to allocate compute per prompt, improving the success--cost Pareto frontier over fixed model selection. Across three dominant scaling axes, namely chain-of-thought depth, model size, and memory history, our experiments on VLABench and RoboMME show that test-time compute is not a uniform lever: different axes yield qualitatively distinct capability gains. We validate these insights on a physical Franka arm in a DROID setup spanning zero-shot manipulation and long-horizon chaining, where our router matches or exceeds a stronger model's success rate at up to 65% lower average latency. Ultimately, our results show that naively scaling test-time compute is wasteful, and that DIRECT can provide frontier-level embodied planning in robotic systems at a fraction of the cost. Project page can be found at jadee-dao.github.io/direct/.
Abstract（参考訳）: VLM(Vision-Language Models)は、インボディードエージェントの高レベルプランナーとして、テスト時間計算をスケールして能力を向上させる新たな戦略として、ますます普及している。しかし、そうすることでレイテンシ、トークン使用量、FLOPが増大する一方で、下流の成功率が低下し、エンボディエージェントがデプロイできる場所が制限される。テストタイムの計算をいつ、どこで使うかを選択することは、実際の世界にフロンティアパフォーマンスをもたらす中心である、と私たちは主張する。 DIRECTは、マルチモーダルシーンコンテキストを用いてプロンプト毎に計算を割り当て、固定モデル選択よりもコストのかかるパレートフロンティアを改善するルーティングフレームワークである。 VLABench と RoboMME の実験では,3つの主要なスケーリング軸,すなわちチェーン・オブ・シンプット・ディープ,モデルサイズ,メモリ履歴に対して,テスト時間計算が一様レバーではないことを示す。我々は、ゼロショット操作とロングホライズンチェアリングにまたがるDROIDセットアップにおいて、物理フランカアーム上のこれらの洞察を検証した。最終的に、我々の研究結果は、テストタイムの計算を段階的にスケールすることは無駄であり、DIRECTはロボットシステムにおけるフロンティアレベルの実施計画を、ほんの少しのコストで提供できることを示している。プロジェクトページは jadee-dao.github.io/direct/ にある。

論文の概要: DIRECT: When and Where Should You Allocate Test-Time Compute in Embodied Planners?

関連論文リスト