Fugu-MT 論文翻訳(概要): Emergent Dexterity via Diverse Resets and Large-Scale Reinforcement Learning

論文の概要: Emergent Dexterity via Diverse Resets and Large-Scale Reinforcement Learning

arxiv url: http://arxiv.org/abs/2603.15789v1
Date: Mon, 16 Mar 2026 18:14:36 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-21 18:33:56.881625
Title: Emergent Dexterity via Diverse Resets and Large-Scale Reinforcement Learning
Title（参考訳）: 逆リセットと大規模強化学習による創発的デキスタリティ
Authors: Patrick Yin, Tyler Westenbroek, Zhengyu Zhang, Joshua Tran, Ignacio Dagnino, Eeshani Shilamkar, Numfor Mbiziwo-Tiapo, Simran Bagaria, Xinlei Liu, Galen Mullins, Andrey Kolobov, Abhishek Gupta,
Abstract要約: メソッドはシンプルでスケーラブルなフレームワークであり、オンラインの強化学習により、広範囲にわたる巧妙な操作タスクを堅牢に解決することができる。メソッドは最小限の人間の入力でリセットを生成し、追加の計算を直接より広範な行動カバレッジに変換する。提案手法は,既存のアプローチの能力を超えた,長期のデクスタラスな操作タスクに優雅にスケール可能であることを示す。
参考スコア（独自算出の注目度）: 14.911497503823123
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning in massively parallel physics simulations has driven major progress in sim-to-real robot learning. However, current approaches remain brittle and task-specific, relying on extensive per-task engineering to design rewards, curricula, and demonstrations. Even with this engineering, they often fail on long-horizon, contact-rich manipulation tasks and do not meaningfully scale with compute, as performance quickly saturates when training revisits the same narrow regions of state space. We introduce \Method, a simple and scalable framework that enables on-policy reinforcement learning to robustly solve a broad class of dexterous manipulation tasks using a single reward function, fixed algorithm hyperparameters, no curricula, and no human demonstrations. Our key insight is that long-horizon exploration can be dramatically simplified by using simulator resets to systematically expose the RL algorithm to the diverse set of robot-object interactions which underlie dexterous manipulation. \Method\ programmatically generates such resets with minimal human input, converting additional compute directly into broader behavioral coverage and continued performance gains. We show that \Method\ gracefully scales to long-horizon dexterous manipulation tasks beyond the capabilities of existing approaches and is able to learn robust policies over significantly wider ranges of initial conditions than baselines. Finally, we distill \Method \ into visuomotor policies which display robust retrying behavior and substantially higher success rates than baselines when transferred to the real world zero-shot. Project webpage: https://omnireset.github.io
Abstract（参考訳）: 大規模並列物理シミュレーションにおける強化学習は、シモン・トゥ・リアル・ロボット・ラーニングにおいて大きな進歩をもたらした。しかし、現在のアプローチは依然として脆弱でタスク固有であり、報酬、カリキュラム、デモを設計するためにタスクごとの広範なエンジニアリングに依存している。このエンジニアリングであっても、それらは長い水平でコンタクトリッチな操作タスクで失敗することが多く、トレーニング時に同じ狭い状態空間の領域を再考するので、計算で意味のあるスケールにはならない。そこで我々は,単一報酬関数,固定アルゴリズムハイパーパラメータ,キュリキュラなし,人間による実演のない,多種多様な操作タスクを頑健に解決する,シンプルでスケーラブルなフレームワークである‘Method’を紹介した。我々の重要な洞察は、RLアルゴリズムを多種多様なロボットとオブジェクトの相互作用に体系的に公開するためにシミュレータリセットを使用することで、長期探査を劇的に単純化できるということである。プログラムでこのようなリセットを最小限の人間入力で生成し、追加の計算をより広範な行動カバレッジと継続的なパフォーマンス向上に変換する。そこで,<Method\ は,既存のアプローチの能力を超える長期的操作タスクに優雅にスケールし,ベースラインよりもはるかに広い初期条件の範囲でロバストなポリシーを学習できることを示す。最後に,<Method \ を実世界のゼロショットに移動した場合に,ロバストな再試行行動と,ベースラインよりも極めて高い成功率を示すビジュモータポリシーに蒸留する。プロジェクトWebページ: https://omnireset.github.io

論文の概要: Emergent Dexterity via Diverse Resets and Large-Scale Reinforcement Learning

関連論文リスト