Fugu-MT 論文翻訳(概要): Using Non-Expert Data to Robustify Imitation Learning via Offline Reinforcement Learning

論文の概要: Using Non-Expert Data to Robustify Imitation Learning via Offline Reinforcement Learning

arxiv url: http://arxiv.org/abs/2510.19495v2
Date: Sat, 25 Oct 2025 01:18:43 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-28 13:14:10.611904
Title: Using Non-Expert Data to Robustify Imitation Learning via Offline Reinforcement Learning
Title（参考訳）: 非専門データを用いたオフライン強化学習による模倣学習のロバスト化
Authors: Kevin Huang, Rosario Scalise, Cleah Winston, Ayush Agrawal, Yunchu Zhang, Rohan Baijal, Markus Grotz, Byron Boots, Benjamin Burchfiel, Masha Itkina, Paarth Shah, Abhishek Gupta,
Abstract要約: オフライン強化学習は、非専門的なデータを利用して模倣学習ポリシーの性能を向上させることができることを示す。提案手法は, オフラインRLにより拡張された模倣アルゴリズムにより, タスクを頑健に解決できることを示す。
参考スコア（独自算出の注目度）: 21.705096559151286
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Imitation learning has proven effective for training robots to perform complex tasks from expert human demonstrations. However, it remains limited by its reliance on high-quality, task-specific data, restricting adaptability to the diverse range of real-world object configurations and scenarios. In contrast, non-expert data -- such as play data, suboptimal demonstrations, partial task completions, or rollouts from suboptimal policies -- can offer broader coverage and lower collection costs. However, conventional imitation learning approaches fail to utilize this data effectively. To address these challenges, we posit that with right design decisions, offline reinforcement learning can be used as a tool to harness non-expert data to enhance the performance of imitation learning policies. We show that while standard offline RL approaches can be ineffective at actually leveraging non-expert data under the sparse data coverage settings typically encountered in the real world, simple algorithmic modifications can allow for the utilization of this data, without significant additional assumptions. Our approach shows that broadening the support of the policy distribution can allow imitation algorithms augmented by offline RL to solve tasks robustly, showing considerably enhanced recovery and generalization behavior. In manipulation tasks, these innovations significantly increase the range of initial conditions where learned policies are successful when non-expert data is incorporated. Moreover, we show that these methods are able to leverage all collected data, including partial or suboptimal demonstrations, to bolster task-directed policy performance. This underscores the importance of algorithmic techniques for using non-expert data for robust policy learning in robotics. Website: https://uwrobotlearning.github.io/RISE-offline/
Abstract（参考訳）: シミュレーション学習は、熟練した人間のデモンストレーションから複雑なタスクをロボットに訓練するのに有効であることが証明されている。しかし、高品質でタスク固有のデータに依存しているため、さまざまな現実世界のオブジェクト構成やシナリオへの適応性を制限している。対照的に、プレイデータ、準最適デモ、部分的タスク完了、準最適ポリシーからのロールアウトなど、非専門家のデータは、より広範なカバレッジを提供し、コレクションコストを低減します。しかし、従来の模倣学習手法では、このデータを効果的に利用できない。これらの課題に対処するため、我々は、適切な設計判断をすることで、オフライン強化学習を、非専門的なデータを活用して模倣学習ポリシーの性能を高めるツールとして利用できると仮定する。通常のオフラインRLアプローチは、通常、現実世界で発生するスパースなデータカバレッジ設定の下で、実際に非専門的なデータを活用するには効果がないが、単純なアルゴリズムによる修正により、追加の仮定を伴わずに、このデータを利用することが可能になる。提案手法は, オフラインRLにより拡張された模倣アルゴリズムにより, タスクを堅牢に解くことができ, 回復と一般化の挙動が著しく向上していることを示す。操作タスクにおいて、これらの革新は、非専門的なデータが組み込まれた場合、学習されたポリシーが成功する初期条件の範囲を大幅に増加させる。さらに,これらの手法は,タスク指向の政策性能を高めるために,部分的あるいは準最適の実証を含むすべての収集データを活用可能であることを示す。このことは、ロボット工学におけるロバストなポリシー学習のために、非専門家データを使用するアルゴリズム技術の重要性を浮き彫りにしている。ウェブサイト:https://uwrobotlearning.github.io/RISE-offline/

論文の概要: Using Non-Expert Data to Robustify Imitation Learning via Offline Reinforcement Learning

関連論文リスト