Co-Imitation Learning without Expert Demonstration
- URL: http://arxiv.org/abs/2103.14823v2
- Date: Sun, 23 Jul 2023 06:43:15 GMT
- Title: Co-Imitation Learning without Expert Demonstration
- Authors: Kun-Peng Ning, Hu Xu, Kun Zhu, Sheng-Jun Huang
- Abstract summary: We propose a novel learning framework called Co-Imitation Learning (CoIL) to exploit the past good experiences of the agents without expert demonstration.
While the experiences could be valuable or misleading, we propose to estimate the potential utility of each piece of experience with the expected gain of the value function.
Experimental results on various tasks show significant superiority of the proposed Co-Imitation Learning framework.
- Score: 39.988945772085465
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Imitation learning is a primary approach to improve the efficiency of
reinforcement learning by exploiting the expert demonstrations. However, in
many real scenarios, obtaining expert demonstrations could be extremely
expensive or even impossible. To overcome this challenge, in this paper, we
propose a novel learning framework called Co-Imitation Learning (CoIL) to
exploit the past good experiences of the agents themselves without expert
demonstration. Specifically, we train two different agents via letting each of
them alternately explore the environment and exploit the peer agent's
experience. While the experiences could be valuable or misleading, we propose
to estimate the potential utility of each piece of experience with the expected
gain of the value function. Thus the agents can selectively imitate from each
other by emphasizing the more useful experiences while filtering out noisy
ones. Experimental results on various tasks show significant superiority of the
proposed Co-Imitation Learning framework, validating that the agents can
benefit from each other without external supervision.
Related papers
- A Bayesian Solution To The Imitation Gap [34.16107600758348]
An agent must learn to act in environments where no reward signal can be specified.
In some cases, differences in observability between the expert and the agent can give rise to an imitation gap.
arXiv Detail & Related papers (2024-06-29T17:13:37Z) - RILe: Reinforced Imitation Learning [60.63173816209543]
RILe is a novel trainer-student system that learns a dynamic reward function based on the student's performance and alignment with expert demonstrations.
RILe enables better performance in complex settings where traditional methods falter, outperforming existing methods by 2x in complex simulated robot-locomotion tasks.
arXiv Detail & Related papers (2024-06-12T17:56:31Z) - "Give Me an Example Like This": Episodic Active Reinforcement Learning from Demonstrations [3.637365301757111]
Methods like Reinforcement Learning from Expert Demonstrations (RLED) introduce external expert demonstrations to facilitate agent exploration during the learning process.
How to select the best set of human demonstrations that is most beneficial for learning becomes a major concern.
This paper presents EARLY, an algorithm that enables a learning agent to generate optimized queries of expert demonstrations in a trajectory-based feature space.
arXiv Detail & Related papers (2024-06-05T08:52:21Z) - Iterative Experience Refinement of Software-Developing Agents [81.09737243969758]
Large language models (LLMs) can leverage past experiences to reduce errors and enhance efficiency.
This paper introduces the Iterative Experience Refinement framework, enabling LLM agents to refine experiences iteratively during task execution.
arXiv Detail & Related papers (2024-05-07T11:33:49Z) - Inverse Reinforcement Learning with Sub-optimal Experts [56.553106680769474]
We study the theoretical properties of the class of reward functions that are compatible with a given set of experts.
Our results show that the presence of multiple sub-optimal experts can significantly shrink the set of compatible rewards.
We analyze a uniform sampling algorithm that results in being minimax optimal whenever the sub-optimal experts' performance level is sufficiently close to the one of the optimal agent.
arXiv Detail & Related papers (2024-01-08T12:39:25Z) - Skill Disentanglement for Imitation Learning from Suboptimal
Demonstrations [60.241144377865716]
We consider the imitation of sub-optimal demonstrations, with both a small clean demonstration set and a large noisy set.
We propose method by evaluating and imitating at the sub-demonstration level, encoding action primitives of varying quality into different skills.
arXiv Detail & Related papers (2023-06-13T17:24:37Z) - Imitation Learning by Estimating Expertise of Demonstrators [92.20185160311036]
We show that unsupervised learning over demonstrator expertise can lead to a consistent boost in the performance of imitation learning algorithms.
We develop and optimize a joint model over a learned policy and expertise levels of the demonstrators.
We illustrate our findings on real-robotic continuous control tasks from Robomimic and discrete environments such as MiniGrid and chess.
arXiv Detail & Related papers (2022-02-02T21:23:19Z) - Reinforcement Learning with Supervision from Noisy Demonstrations [38.00968774243178]
We propose a novel framework to adaptively learn the policy by jointly interacting with the environment and exploiting the expert demonstrations.
Experimental results in various environments with multiple popular reinforcement learning algorithms show that the proposed approach can learn robustly with noisy demonstrations.
arXiv Detail & Related papers (2020-06-14T06:03:06Z) - Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning [11.292086312664383]
Our proposed algorithm, called Shared Experience Actor-Critic (SEAC), applies experience sharing in an actor-critic framework.
We evaluate SEAC in a collection of sparse-reward multi-agent environments and find that it consistently outperforms two baselines and two state-of-the-art algorithms.
arXiv Detail & Related papers (2020-06-12T13:24:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.