Fugu-MT 論文翻訳(概要): Formulating Reinforcement Learning for Human-Robot Collaboration through Off-Policy Evaluation

論文の概要: Formulating Reinforcement Learning for Human-Robot Collaboration through Off-Policy Evaluation

arxiv url: http://arxiv.org/abs/2602.02530v1
Date: Tue, 27 Jan 2026 21:35:13 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-04 18:37:14.906281
Title: Formulating Reinforcement Learning for Human-Robot Collaboration through Off-Policy Evaluation
Title（参考訳）: オフ・ポリティィ・アセスメントによる人間-ロボット協調のための定式化強化学習
Authors: Saurav Singh, Rodney Sanchez, Alexander Ororbia, Jamison Heard,
Abstract要約: 強化学習(RL)は、現実世界の意思決定システムを変革する可能性がある。従来のRLアプローチはドメインの専門知識とトライアル・アンド・エラーに依存することが多い。本研究では、状態空間の非政治評価と報酬関数の選択を利用する新しいRLフレームワークを提案する。
参考スコア（独自算出の注目度）: 42.19772341787033
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reinforcement learning (RL) has the potential to transform real-world decision-making systems by enabling autonomous agents to learn from experience. Deploying RL in real-world settings, especially in the context of human-robot interaction, requires defining state representations and reward functions, which are critical for learning efficiency and policy performance. Traditional RL approaches often rely on domain expertise and trial-and-error, necessitating extensive human involvement as well as direct interaction with the environment, which can be costly and impractical, especially in complex and safety-critical applications. This work proposes a novel RL framework that leverages off-policy evaluation (OPE) for state space and reward function selection, using only logged interaction data. This approach eliminates the need for real-time access to the environment or human-in-the-loop feedback, greatly reducing the dependency on costly real-time interactions. The proposed approach systematically evaluates multiple candidate state representations and reward functions by training offline RL agents and applying OPE to estimate policy performance. The optimal state space and reward function are selected based on their ability to produce high-performing policies under OPE metrics. Our method is validated on two environments: the Lunar Lander environment by OpenAI Gym, which provides a controlled setting for assessing state space and reward function selection, and a NASA-MATB-II human subjects study environment, which evaluates the approach's real-world applicability to human-robot teaming scenarios. This work enhances the feasibility and scalability of offline RL for real-world environments by automating critical RL design decisions through a data-driven OPE-based evaluation, enabling more reliable, effective, and sustainable RL formulation for complex human-robot interaction settings.
Abstract（参考訳）: 強化学習(Reinforcement Learning, RL)は、自律的なエージェントが経験から学ぶことによって、現実世界の意思決定システムを変革する可能性がある。実世界の環境でのRLの展開、特に人間とロボットの相互作用の文脈では、状態表現と報酬関数の定義が必要である。従来のRLアプローチはドメインの専門知識とトライアル・アンド・エラーに頼り、大規模な人間による関与と環境との直接的な相互作用を必要とする。本研究は,ログ化されたインタラクションデータのみを用いて,状態空間と報酬関数の選択にオフ政治評価(OPE)を活用する新しいRLフレームワークを提案する。このアプローチは、環境へのリアルタイムアクセスやループ内フィードバックの必要性を排除し、コストのかかるリアルタイムインタラクションへの依存性を大幅に削減します。提案手法は、オフラインのRLエージェントを訓練し、OPEを適用してポリシー性能を推定することにより、複数の候補状態表現と報酬関数を体系的に評価する。最適状態空間と報酬関数は、OPEメトリクスの下で高いパフォーマンスのポリシーを生成する能力に基づいて選択される。提案手法は,状態空間と報酬関数の選択を制御したOpenAI GymによるLunar Lander環境と,人間とロボットのコラボレーションシナリオに対するアプローチの現実的適用性を評価するNASA-MATB-II人体研究環境の2つの環境で検証されている。この作業は、データ駆動型OPEに基づく評価を通じて重要なRL設計決定を自動化することで、実環境におけるオフラインRLの実現性とスケーラビリティを高め、複雑な人間とロボットのインタラクション設定のためのより信頼性、有効、持続可能なRL定式化を可能にする。

論文の概要: Formulating Reinforcement Learning for Human-Robot Collaboration through Off-Policy Evaluation

関連論文リスト