Fugu-MT 論文翻訳(概要): CaP-X: A Framework for Benchmarking and Improving Coding Agents for Robot Manipulation

論文の概要: CaP-X: A Framework for Benchmarking and Improving Coding Agents for Robot Manipulation

arxiv url: http://arxiv.org/abs/2603.22435v1
Date: Mon, 23 Mar 2026 18:08:10 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-25 19:53:37.131061
Title: CaP-X: A Framework for Benchmarking and Improving Coding Agents for Robot Manipulation
Title（参考訳）: CaP-X:ロボット操作のためのコーディングエージェントのベンチマークと改善のためのフレームワーク
Authors: Max Fu, Justin Yu, Karim El-Refai, Ethan Kou, Haoru Xue, Huang Huang, Wenli Xiao, Guanzhi Wang, Fei-Fei Li, Guanya Shi, Jiajun Wu, Shankar Sastry, Yuke Zhu, Ken Goldberg, Linxi "Jim" Fan,
Abstract要約: "Code-as-Policy" は、実行可能コードがデータ集約型のVision-Language-Actionメソッドを補完する方法について考察している。ロボット操作におけるCode-as-PolicyエージェントのオープンアクセスフレームワークであるCaP-Xを提案する。
参考スコア（独自算出の注目度）: 48.85772216740915
License: http://creativecommons.org/licenses/by/4.0/
Abstract: "Code-as-Policy" considers how executable code can complement data-intensive Vision-Language-Action (VLA) methods, yet their effectiveness as autonomous controllers for embodied manipulation remains underexplored. We present CaP-X, an open-access framework for systematically studying Code-as-Policy agents in robot manipulation. At its core is CaP-Gym, an interactive environment in which agents control robots by synthesizing and executing programs that compose perception and control primitives. Building on this foundation, CaP-Bench evaluates frontier language and vision-language models across varying levels of abstraction, interaction, and perceptual grounding. Across 12 models, CaP-Bench reveals a consistent trend: performance improves with human-crafted abstractions but degrades as these priors are removed, exposing a dependence on designer scaffolding. At the same time, we observe that this gap can be mitigated through scaling agentic test-time computation--through multi-turn interaction, structured execution feedback, visual differencing, automatic skill synthesis, and ensembled reasoning--substantially improves robustness even when agents operate over low-level primitives. These findings allow us to derive CaP-Agent0, a training-free framework that recovers human-level reliability on several manipulation tasks in simulation and on real embodiments. We further introduce CaP-RL, showing reinforcement learning with verifiable rewards improves success rates and transfers from sim2real with minimal gap. Together, CaP-X provides a principled, open-access platform for advancing embodied coding agents.
Abstract（参考訳）: コード・アズ・ポリシィ(Code-as-Policy)は、実行可能コードがデータ集約型ビジョン・ランゲージ・アクション(VLA)メソッドをどのように補完するかを考察するが、それらが具体化された操作のための自律的なコントローラとしての有効性は未解明のままである。ロボット操作におけるCode-as-Policyエージェントを体系的に研究するためのオープンアクセスフレームワークであるCaP-Xを提案する。 CaP-Gymは、エージェントが知覚とプリミティブを構成するプログラムを合成して実行することによってロボットを制御する対話型環境である。この基盤の上に構築されたCaP-Benchは、様々なレベルの抽象化、相互作用、知覚的グラウンド化において、フロンティア言語とビジョン言語モデルを評価する。 12モデル全体で、CaP-Benchは一貫性のある傾向を明らかにしている。同時に,エージェントが低レベルプリミティブ上で動作しても,エージェント間のマルチターンインタラクション,構造化された実行フィードバック,視覚的差異,自動スキル合成,アンサンブル推論などを通じて,このギャップを緩和することができる。これらの結果から,シミュレーションや実演における操作タスクにおいて,人間レベルの信頼性を回復するトレーニングフリーフレームワークであるCaP-Agent0の導出が可能となった。さらにCaP-RLを導入し、検証可能な報酬による強化学習により、成功率やsim2realからの移行を最小限のギャップで改善することを示す。 CaP-Xは、エンボディ化されたコーディングエージェントを進化させるための、原則化されたオープンアクセスプラットフォームを提供する。

論文の概要: CaP-X: A Framework for Benchmarking and Improving Coding Agents for Robot Manipulation

関連論文リスト