Fugu-MT 論文翻訳(概要): ExpertGen: Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavior Priors

論文の概要: ExpertGen: Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavior Priors

arxiv url: http://arxiv.org/abs/2603.15956v1
Date: Mon, 16 Mar 2026 22:12:48 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-18 17:42:07.013464
Title: ExpertGen: Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavior Priors
Title（参考訳）: ExpertGen: 不完全な振る舞いから学ぶスケーラブルなSim-to-Realのエキスパートポリシー
Authors: Zifan Xu, Ran Gong, Maria Vittoria Minniti, Ahmet Salih Gundogdu, Eric Rosen, Kausik Sivakumar, Riedana Yan, Zixing Wang, Di Deng, Peter Stone, Xiaohan Zhang, Karl Schmeckpeper,
Abstract要約: ExpertGenは、スケーラブルなsim-to-real転送を可能にするために、シミュレーションで専門家のポリシー学習を自動化するフレームワークである。工業組み立て作業では、ExpertGenは90.5%の全体的な成功率を達成し、長期操作タスクでは85%の総成功を達成する。
参考スコア（独自算出の注目度）: 23.712657768774818
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Learning generalizable and robust behavior cloning policies requires large volumes of high-quality robotics data. While human demonstrations (e.g., through teleoperation) serve as the standard source for expert behaviors, acquiring such data at scale in the real world is prohibitively expensive. This paper introduces ExpertGen, a framework that automates expert policy learning in simulation to enable scalable sim-to-real transfer. ExpertGen first initializes a behavior prior using a diffusion policy trained on imperfect demonstrations, which may be synthesized by large language models or provided by humans. Reinforcement learning is then used to steer this prior toward high task success by optimizing the diffusion model's initial noise while keep original policy frozen. By keeping the pretrained diffusion policy frozen, ExpertGen regularizes exploration to remain within safe, human-like behavior manifolds, while also enabling effective learning with only sparse rewards. Empirical evaluations on challenging manipulation benchmarks demonstrate that ExpertGen reliably produces high-quality expert policies with no reward engineering. On industrial assembly tasks, ExpertGen achieves a 90.5% overall success rate, while on long-horizon manipulation tasks it attains 85% overall success, outperforming all baseline methods. The resulting policies exhibit dexterous control and remain robust across diverse initial configurations and failure states. To validate sim-to-real transfer, the learned state-based expert policies are further distilled into visuomotor policies via DAgger and successfully deployed on real robotic hardware.
Abstract（参考訳）: 一般化可能で堅牢な行動クローニングポリシーの学習には、大量の高品質なロボティクスデータが必要である。人間のデモンストレーション(例えば、遠隔操作)は専門家の行動の標準となるが、そのようなデータを現実世界で大規模に取得することは違法に高価である。本稿では、スケーラブルなsim-to-real転送を実現するために、シミュレーションにおけるエキスパートポリシー学習を自動化するフレームワークであるExpertGenを紹介する。 ExpertGenはまず、不完全なデモンストレーションに基づいてトレーニングされた拡散ポリシーを使用して、大きな言語モデルによって合成されたり、人間が提供したりする前に振る舞いを初期化する。強化学習は、拡散モデルの初期ノイズを最適化し、元のポリシーを凍結させながら、高いタスク成功に向けて、この前もってこれを操縦するために使用される。事前訓練された拡散政策を凍結させることで、ExpertGenは、安全で人間的な行動多様体内に留まるよう探索を規則化するとともに、わずかな報酬だけで効果的な学習を可能にする。試行錯誤ベンチマークに関する実証的な評価は、ExpertGenが報酬エンジニアリングのない高品質なエキスパートポリシーを確実に生成していることを示している。産業組み立て作業では、ExpertGenは90.5%の全体的な成功率を達成し、長期操作タスクでは85%の全体的な成功を達成し、すべてのベースライン手法を上回っている。結果として得られたポリシは厳格なコントロールを示し、さまざまな初期設定と障害状態に対して堅牢である。 sim-to-real転送を検証するために、学習された状態ベースのエキスパートポリシーは、DAggerを介してビジュモータポリシーに蒸留され、実際のロボットハードウェアにうまくデプロイされる。

論文の概要: ExpertGen: Scalable Sim-to-Real Expert Policy Learning from Imperfect Behavior Priors

関連論文リスト