Imitation Learning from Suboptimal Demonstrations via Meta-Learning An Action Ranker
- URL: http://arxiv.org/abs/2412.20193v1
- Date: Sat, 28 Dec 2024 16:06:44 GMT
- Title: Imitation Learning from Suboptimal Demonstrations via Meta-Learning An Action Ranker
- Authors: Jiangdong Fan, Hongcai He, Paul Weng, Hui Xu, Jie Shao,
- Abstract summary: A major bottleneck in imitation learning is the requirement of a large number of expert demonstrations.
We propose a novel approach named imitation learning via meta-learning an action ranker (ILMAR)
ILMAR implements weighted behavior cloning (weighted BC) on a limited set of expert demonstrations along with supplementary demonstrations.
- Score: 9.6508237676589
- License:
- Abstract: A major bottleneck in imitation learning is the requirement of a large number of expert demonstrations, which can be expensive or inaccessible. Learning from supplementary demonstrations without strict quality requirements has emerged as a powerful paradigm to address this challenge. However, previous methods often fail to fully utilize their potential by discarding non-expert data. Our key insight is that even demonstrations that fall outside the expert distribution but outperform the learned policy can enhance policy performance. To utilize this potential, we propose a novel approach named imitation learning via meta-learning an action ranker (ILMAR). ILMAR implements weighted behavior cloning (weighted BC) on a limited set of expert demonstrations along with supplementary demonstrations. It utilizes the functional of the advantage function to selectively integrate knowledge from the supplementary demonstrations. To make more effective use of supplementary demonstrations, we introduce meta-goal in ILMAR to optimize the functional of the advantage function by explicitly minimizing the distance between the current policy and the expert policy. Comprehensive experiments using extensive tasks demonstrate that ILMAR significantly outperforms previous methods in handling suboptimal demonstrations. Code is available at https://github.com/F-GOD6/ILMAR.
Related papers
- "Give Me an Example Like This": Episodic Active Reinforcement Learning from Demonstrations [3.637365301757111]
Methods like Reinforcement Learning from Expert Demonstrations (RLED) introduce external expert demonstrations to facilitate agent exploration during the learning process.
How to select the best set of human demonstrations that is most beneficial for learning becomes a major concern.
This paper presents EARLY, an algorithm that enables a learning agent to generate optimized queries of expert demonstrations in a trajectory-based feature space.
arXiv Detail & Related papers (2024-06-05T08:52:21Z) - Show, Don't Tell: Aligning Language Models with Demonstrated Feedback [54.10302745921713]
Demonstration ITerated Task Optimization (DITTO) directly aligns language model outputs to a user's demonstrated behaviors.
We evaluate DITTO's ability to learn fine-grained style and task alignment across domains such as news articles, emails, and blog posts.
arXiv Detail & Related papers (2024-06-02T23:13:56Z) - Inverse-RLignment: Large Language Model Alignment from Demonstrations through Inverse Reinforcement Learning [62.05713042908654]
We introduce Alignment from Demonstrations (AfD), a novel approach leveraging high-quality demonstration data to overcome these challenges.
We formalize AfD within a sequential decision-making framework, highlighting its unique challenge of missing reward signals.
Practically, we propose a computationally efficient algorithm that extrapolates over a tailored reward model for AfD.
arXiv Detail & Related papers (2024-05-24T15:13:53Z) - AdaDemo: Data-Efficient Demonstration Expansion for Generalist Robotic Agent [75.91274222142079]
In this study, we aim to scale up demonstrations in a data-efficient way to facilitate the learning of generalist robotic agents.
AdaDemo is a framework designed to improve multi-task policy learning by actively and continually expanding the demonstration dataset.
arXiv Detail & Related papers (2024-04-11T01:59:29Z) - Beyond Joint Demonstrations: Personalized Expert Guidance for Efficient Multi-Agent Reinforcement Learning [54.40927310957792]
We introduce a novel concept of personalized expert demonstrations, tailored for each individual agent or, more broadly, each individual type of agent within a heterogeneous team.
These demonstrations solely pertain to single-agent behaviors and how each agent can achieve personal goals without encompassing any cooperative elements.
We propose an approach that selectively utilizes personalized expert demonstrations as guidance and allows agents to learn to cooperate.
arXiv Detail & Related papers (2024-03-13T20:11:20Z) - Imitator Learning: Achieve Out-of-the-Box Imitation Ability in Variable
Environments [45.213059639254475]
We propose a new topic called imitator learning (ItorL)
It aims to derive an imitator module that can reconstruct the imitation policies based on very limited expert demonstrations.
For autonomous imitation policy building, we design a demonstration-based attention architecture for imitator policy.
arXiv Detail & Related papers (2023-10-09T13:35:28Z) - Skill Disentanglement for Imitation Learning from Suboptimal
Demonstrations [60.241144377865716]
We consider the imitation of sub-optimal demonstrations, with both a small clean demonstration set and a large noisy set.
We propose method by evaluating and imitating at the sub-demonstration level, encoding action primitives of varying quality into different skills.
arXiv Detail & Related papers (2023-06-13T17:24:37Z) - Learning from Imperfect Demonstrations from Agents with Varying Dynamics [29.94164262533282]
We develop a metric composed of a feasibility score and an optimality score to measure how useful a demonstration is for imitation learning.
Our experiments on four environments in simulation and on a real robot show improved learned policies with higher expected return.
arXiv Detail & Related papers (2021-03-10T07:39:38Z) - Learn to Exceed: Stereo Inverse Reinforcement Learning with Concurrent
Policy Optimization [1.0965065178451106]
We study the problem of obtaining a control policy that can mimic and then outperform expert demonstrations in Markov decision processes.
One main relevant approach is the inverse reinforcement learning (IRL), which mainly focuses on inferring a reward function from expert demonstrations.
We propose a novel method that enables the learning agent to outperform the demonstrator via a new concurrent reward and action policy learning approach.
arXiv Detail & Related papers (2020-09-21T02:16:21Z) - Reinforcement Learning with Supervision from Noisy Demonstrations [38.00968774243178]
We propose a novel framework to adaptively learn the policy by jointly interacting with the environment and exploiting the expert demonstrations.
Experimental results in various environments with multiple popular reinforcement learning algorithms show that the proposed approach can learn robustly with noisy demonstrations.
arXiv Detail & Related papers (2020-06-14T06:03:06Z) - Learning Sparse Rewarded Tasks from Sub-Optimal Demonstrations [78.94386823185724]
Imitation learning learns effectively in sparse-rewarded tasks by leveraging the existing expert demonstrations.
In practice, collecting a sufficient amount of expert demonstrations can be prohibitively expensive.
We propose Self-Adaptive Learning (SAIL) that can achieve (near) optimal performance given only a limited number of sub-optimal demonstrations.
arXiv Detail & Related papers (2020-04-01T15:57:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.