Fugu-MT 論文翻訳(概要): ENPIRE: Agentic Robot Policy Self-Improvement in the Real World

論文の概要: ENPIRE: Agentic Robot Policy Self-Improvement in the Real World

arxiv url: http://arxiv.org/abs/2606.19980v1
Date: Thu, 18 Jun 2026 09:21:27 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-19 18:23:39.760305
Title: ENPIRE: Agentic Robot Policy Self-Improvement in the Real World
Title（参考訳）: ENPIRE:現実世界におけるエージェントロボット政策の自己改善
Authors: Wenli Xiao, Jia Xie, Tonghe Zhang, Haotian Lin, Letian "Max" Fu, Haoru Xue, Jalen Lu, Yi Yang, Cunxi Dai, Zi Wang, Jimmy Wu, Guanzhi Wang, S. Shankar Sastry, Ken Goldberg, Linxi "Jim" Fan, Yuke Zhu, Guanya Shi,
Abstract要約: ENPIREは、物理フィードバックルーチンを4つのコアモジュールでインスタンス化するコーディングエージェントのためのハーネスフレームワークである。 ENPIREの力で、フロンティアコーディングエージェントは、困難な巧妙な操作タスクで99%の成功率を達成するために、ポリシーを自律的に訓練することができる。
参考スコア（独自算出の注目度）: 40.75426390954145
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Achieving dexterous robotic manipulation in the real world heavily relies on human supervision and algorithm engineering, which becomes a central bottleneck in the pursuit of general physical intelligence. Although emerging coding agents can generate code to automate algorithm search, their successes remain largely confined in digital environments. We conjecture that the missing abstraction to automate robotics research is a repeatable feedback loop for real-world policy improvement: reset the scene, execute a policy, verify the outcome, and refine the next iteration. To bridge this gap, we introduce ENPIRE, a harness framework for coding agents that instantiates this physical feedback routine with four core modules: an Environment module (EN) for automatic reset and verification, a Policy Improvement module (PI) that launches policy refinement, a Rollout module (R) to evaluate policies with one or multiple physical robots operating in parallel, and an Evolution module (E) in which coding agents analyze logs, consult literature, improve training infrastructure and algorithm code to address failure modes. This closed-loop system transforms real-world manipulation learning into a controllable optimization procedure, minimizing human effort while allowing fair ablations across training recipe and agent variants. Powered by ENPIRE, frontier coding agents can autonomously train a policy to achieve a 99% success rate on challenging, dexterous manipulation tasks, such as organizing a pin box, fastening a zip tie, and tool use, a process that further accelerates when we dispatch an agent team on a robot fleet. Our results suggest a practical and scalable path toward deploying coding agents to autonomously advancing robotics in the physical world.
Abstract（参考訳）: 現実世界での巧妙なロボット操作の達成は、人間の監督とアルゴリズム工学に大きく依存しており、一般的な身体知性の追求において中心的なボトルネックとなっている。新たなコーディングエージェントは、アルゴリズム検索を自動化するコードを生成することができるが、その成功は主にデジタル環境に限られている。ロボット研究を自動化するための抽象化が欠如していることは、現実の政策改善のための繰り返し可能なフィードバックループである、と推測する:シーンをリセットし、ポリシーを実行し、結果を確認し、次のイテレーションを洗練する。このギャップを埋めるために、我々は、この物理フィードバックルーチンを、自動リセットと検証のための環境モジュール(EN)、ポリシーリファインメントを起動するポリシー改善モジュール(PI)、並列に動作する1つまたは複数の物理ロボットによるポリシー評価を行うロールアウトモジュール(R)、コーディングエージェントがログを分析し、文献を相談し、トレーニングインフラと障害モードに対処するためのアルゴリズムコードを改良する進化モジュール(E)の4つのコアモジュールで、コーディングエージェントのためのハーネスフレームワークであるENPIREを紹介した。このクローズドループシステムは、現実世界の操作学習を制御可能な最適化手順に変換し、トレーニングレシピやエージェントの変種間の公正な改善を許容しながら、人間の努力を最小限にする。 ENPIREによって、フロンティアコーディングエージェントは、ピンボックスの編成、ジップタイの締め付け、ツールの使用など、困難な操作タスクにおいて、99%の成功率を達成するためのポリシーを自律的にトレーニングすることができる。以上の結果から,物理的世界における自律的なロボット工学へのコーディングエージェントの展開に向けた,実用的でスケーラブルな道のりが示唆された。

論文の概要: ENPIRE: Agentic Robot Policy Self-Improvement in the Real World

関連論文リスト