Fugu-MT 論文翻訳(概要): Real-world Reinforcement Learning from Suboptimal Interventions

論文の概要: Real-world Reinforcement Learning from Suboptimal Interventions

arxiv url: http://arxiv.org/abs/2512.24288v1
Date: Tue, 30 Dec 2025 15:26:42 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-23 08:17:40.574033
Title: Real-world Reinforcement Learning from Suboptimal Interventions
Title（参考訳）: 準最適介入による実世界の強化学習
Authors: Yinuo Zhao, Huiqian Jin, Lechun Jiang, Xinyi Zhang, Kun Wu, Pei Ren, Zhiyuan Xu, Zhengping Che, Lei Sun, Dapeng Wu, Chi Harold Liu, Jian Tang,
Abstract要約: SiLRI (SiLRI) は、現実のロボット操作タスクのための州立ラグランジアン強化学習アルゴリズムである。我々のアルゴリズムは,人間間遠隔操作システムに基づいて,多様な操作タスクに関する実世界の実験を通じて評価される。
参考スコア（独自算出の注目度）: 39.23110010675281
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Real-world reinforcement learning (RL) offers a promising approach to training precise and dexterous robotic manipulation policies in an online manner, enabling robots to learn from their own experience while gradually reducing human labor. However, prior real-world RL methods often assume that human interventions are optimal across the entire state space, overlooking the fact that even expert operators cannot consistently provide optimal actions in all states or completely avoid mistakes. Indiscriminately mixing intervention data with robot-collected data inherits the sample inefficiency of RL, while purely imitating intervention data can ultimately degrade the final performance achievable by RL. The question of how to leverage potentially suboptimal and noisy human interventions to accelerate learning without being constrained by them thus remains open. To address this challenge, we propose SiLRI, a state-wise Lagrangian reinforcement learning algorithm for real-world robot manipulation tasks. Specifically, we formulate the online manipulation problem as a constrained RL optimization, where the constraint bound at each state is determined by the uncertainty of human interventions. We then introduce a state-wise Lagrange multiplier and solve the problem via a min-max optimization, jointly optimizing the policy and the Lagrange multiplier to reach a saddle point. Built upon a human-as-copilot teleoperation system, our algorithm is evaluated through real-world experiments on diverse manipulation tasks. Experimental results show that SiLRI effectively exploits human suboptimal interventions, reducing the time required to reach a 90% success rate by at least 50% compared with the state-of-the-art RL method HIL-SERL, and achieving a 100% success rate on long-horizon manipulation tasks where other RL methods struggle to succeed. Project website: https://silri-rl.github.io/.
Abstract（参考訳）: 実世界の強化学習(RL)は、正確で巧妙なロボット操作ポリシーをオンラインでトレーニングするための有望なアプローチを提供する。しかしながら、以前の現実世界のRL法は、人間の介入が状態空間全体にわたって最適であると仮定することが多く、専門家のオペレータでさえ、全ての状態において常に最適なアクションを提供できない、あるいは完全にミスを避けることができないという事実を見落としている。ロボットが収集したデータと無差別に介入データを混合することは、RLのサンプル非効率性を継承するが、純粋に介入データを模倣することは、最終的にRLが達成可能な最終性能を低下させる。潜在的に最適でノイズの多い人間の介入をいかに活用して学習を加速させるかという問題は未解決のままである。この課題に対処するために、実世界のロボット操作タスクのための州立ラグランジアン強化学習アルゴリズムSiLRIを提案する。具体的には、制約付きRL最適化としてオンライン操作問題を定式化し、各状態の制約は人間の介入の不確実性によって決定される。次に、状態ワイドなラグランジュ乗算器を導入し、min-max最適化により問題を解き、ポリシーとラグランジュ乗算器を共同最適化してサドル点に達する。我々のアルゴリズムは,人間間遠隔操作システムに基づいて,多様な操作タスクに関する実世界の実験を通じて評価される。実験の結果,SiLRIはヒトの至適介入を効果的に活用し,最先端のRL法であるHIL-SERLと比較して90%以上の成功率を達成するのに必要な時間を少なくとも50%削減し,他のRL法が成功に苦しむ長期操作タスクにおいて100%の成功率を達成することができた。プロジェクトウェブサイト:https://silri-rl.github.io/.com

論文の概要: Real-world Reinforcement Learning from Suboptimal Interventions

関連論文リスト