Fugu-MT 論文翻訳(概要): UniIntervene: Agentic Intervention for Efficient Real-World Reinforcement Learning

論文の概要: UniIntervene: Agentic Intervention for Efficient Real-World Reinforcement Learning

arxiv url: http://arxiv.org/abs/2606.12372v1
Date: Wed, 10 Jun 2026 17:38:24 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-11 16:42:38.60056
Title: UniIntervene: Agentic Intervention for Efficient Real-World Reinforcement Learning
Title（参考訳）: UniIntervene: 効果的な実世界強化学習のためのエージェント介入
Authors: Haoyuan Deng, Yitong Gao, Yudong Lin, Haichao Liu, Zhenyu Wu, Ziwei Wang,
Abstract要約: 非生産的探索を検知し、高価値状態に対する政策を自律的に回復するエージェント介入モデルUniInterveneを提案する。様々な実世界の操作タスクの実験では、UniInterveneは平均成功率を8.6%改善し、最先端のHiL-RLベースラインと比較して人間の介入を57%削減した。
参考スコア（独自算出の注目度）: 10.315300563393782
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Human-in-the-loop reinforcement learning (HiL-RL) has emerged as an effective paradigm for real-world robotic manipulation, enabling online policy improvement with human guidance. However, current HiL-RL frameworks remain intervention-intensive, relying on frequent human corrections to redirect the policy out of unproductive exploration, which incurs high labor cost and limits real-world scalability. To address this, we propose UniIntervene, an agentic intervention model that detects unproductive exploration and autonomously recovers the policy toward high-value states, taking over the bulk of interventions from human operators. Specifically, UniIntervene first performs future-conditioned action-value estimation, predicting the latent consequence of the current action and evaluating its induced value, which provides a more stable progress signal. Building on this, a temporal value-risk critic aggregates recent value dynamics and triggers intervention when the estimated value exhibits sustained stagnation or degradation. When intervention is required, UniIntervene retrieves a high-value recovery target from a memory of past intervention episodes and produces executable corrective actions through a goal-conditioned recovery policy. In this way, UniIntervene turns intervention from passive human correction into a value-aware recovery process for efficient real-world RL. Extensive experiments on diverse real-world manipulation tasks demonstrate that UniIntervene improves the average success rate by 8.6% while reducing human interventions by 57% relative to state-of-the-art HiL-RL baselines.
Abstract（参考訳）: ヒューマン・イン・ザ・ループ強化学習(Human-in-the-loop reinforcement learning, HiL-RL)は、実世界のロボット操作において効果的なパラダイムとして登場し、ヒューマンガイダンスによるオンラインポリシー改善を実現している。しかし、現在のHiL-RLフレームワークは介入に重きを置き、非生産的な探索から政策をリダイレクトするために頻繁に人的修正を頼りにしており、これは高い労働コストを発生させ、現実のスケーラビリティを制限する。そこで本研究では,非生産的探索を検知し,高価値状態に対する政策を自律的に回復するエージェント介入モデルUniInterveneを提案する。特に、UniInterveneは、まず将来条件付きアクション値の推定を行い、現在のアクションの潜伏した結果を予測し、その誘導された値を評価し、より安定した進行信号を提供する。これに基づいて、時間的価値リスク批判が最近の価値ダイナミクスを集約し、見積もり値が持続的な停滞または劣化を示すときに介入をトリガーする。介入が必要な場合、UniInterveneは過去の介入エピソードの記憶から高価値回復目標を検索し、目標条件の回復ポリシーを通じて実行可能な修正アクションを生成する。このように、UniInterveneは、受動的人間の修正からの介入を、効率的な実世界のRLのための価値認識回復プロセスに変換する。様々な実世界の操作タスクに関する大規模な実験により、UniInterveneは平均成功率を8.6%改善し、最先端のHiL-RLベースラインと比較して人間の介入を57%削減した。

論文の概要: UniIntervene: Agentic Intervention for Efficient Real-World Reinforcement Learning

関連論文リスト