Accelerating Self-Imitation Learning from Demonstrations via Policy
Constraints and Q-Ensemble
- URL: http://arxiv.org/abs/2212.03562v1
- Date: Wed, 7 Dec 2022 10:29:13 GMT
- Title: Accelerating Self-Imitation Learning from Demonstrations via Policy
Constraints and Q-Ensemble
- Authors: Chao Li
- Abstract summary: We propose a learning from demonstrations method named A-SILfD.
A-SILfD treats expert demonstrations as the agent's successful experiences and uses experiences to constrain policy improvement.
In four Mujoco continuous control tasks, A-SILfD can significantly outperform baseline methods after 150,000 steps of online training.
- Score: 6.861783783234304
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep reinforcement learning (DRL) provides a new way to generate robot
control policy. However, the process of training control policy requires
lengthy exploration, resulting in a low sample efficiency of reinforcement
learning (RL) in real-world tasks. Both imitation learning (IL) and learning
from demonstrations (LfD) improve the training process by using expert
demonstrations, but imperfect expert demonstrations can mislead policy
improvement. Offline to Online reinforcement learning requires a lot of offline
data to initialize the policy, and distribution shift can easily lead to
performance degradation during online fine-tuning. To solve the above problems,
we propose a learning from demonstrations method named A-SILfD, which treats
expert demonstrations as the agent's successful experiences and uses
experiences to constrain policy improvement. Furthermore, we prevent
performance degradation due to large estimation errors in the Q-function by the
ensemble Q-functions. Our experiments show that A-SILfD can significantly
improve sample efficiency using a small number of different quality expert
demonstrations. In four Mujoco continuous control tasks, A-SILfD can
significantly outperform baseline methods after 150,000 steps of online
training and is not misled by imperfect expert demonstrations during training.
Related papers
- Equivariant Offline Reinforcement Learning [7.822389399560674]
We investigate the use of $SO(2)$-equivariant neural networks for offline RL with a limited number of demonstrations.
Our experimental results show that equivariant versions of Conservative Q-Learning (CQL) and Implicit Q-Learning (IQL) outperform their non-equivariant counterparts.
arXiv Detail & Related papers (2024-06-20T03:02:49Z) - Unlearning with Control: Assessing Real-world Utility for Large Language Model Unlearning [97.2995389188179]
Recent research has begun to approach large language models (LLMs) unlearning via gradient ascent (GA)
Despite their simplicity and efficiency, we suggest that GA-based methods face the propensity towards excessive unlearning.
We propose several controlling methods that can regulate the extent of excessive unlearning.
arXiv Detail & Related papers (2024-06-13T14:41:00Z) - "Give Me an Example Like This": Episodic Active Reinforcement Learning from Demonstrations [3.637365301757111]
Methods like Reinforcement Learning from Expert Demonstrations (RLED) introduce external expert demonstrations to facilitate agent exploration during the learning process.
How to select the best set of human demonstrations that is most beneficial for learning becomes a major concern.
This paper presents EARLY, an algorithm that enables a learning agent to generate optimized queries of expert demonstrations in a trajectory-based feature space.
arXiv Detail & Related papers (2024-06-05T08:52:21Z) - Reverse Forward Curriculum Learning for Extreme Sample and Demonstration Efficiency in Reinforcement Learning [17.092640837991883]
Reinforcement learning (RL) presents a promising framework to learn policies through environment interaction.
One direction includes augmenting RL with offline data demonstrating desired tasks, but past work often require a lot of high-quality demonstration data.
We show how the combination of a reverse curriculum and forward curriculum in our method, RFCL, enables significant improvements in demonstration and sample efficiency.
arXiv Detail & Related papers (2024-05-06T11:33:12Z) - Bootstrapping Reinforcement Learning with Imitation for Vision-Based Agile Flight [20.92646531472541]
We propose a novel approach that combines the performance of Reinforcement Learning (RL) and the sample efficiency of Imitation Learning (IL)
Our framework contains three phases teacher policy using RL with privileged state information distilling it into a student policy via IL, and adaptive fine-tuning via RL.
Tests show our approach can not only learn in scenarios where RL from scratch fails but also outperforms existing IL methods in both robustness and performance.
arXiv Detail & Related papers (2024-03-18T19:25:57Z) - Leveraging Demonstrations to Improve Online Learning: Quality Matters [54.98983862640944]
We show that the degree of improvement must depend on the quality of the demonstration data.
We propose an informed TS algorithm that utilizes the demonstration data in a coherent way through Bayes' rule.
arXiv Detail & Related papers (2023-02-07T08:49:12Z) - On Pathologies in KL-Regularized Reinforcement Learning from Expert
Demonstrations [79.49929463310588]
We show that KL-regularized reinforcement learning with behavioral reference policies can suffer from pathological training dynamics.
We show that the pathology can be remedied by non-parametric behavioral reference policies.
arXiv Detail & Related papers (2022-12-28T16:29:09Z) - Demonstration-Guided Reinforcement Learning with Learned Skills [23.376115889936628]
Demonstration-guided reinforcement learning (RL) is a promising approach for learning complex behaviors.
In this work, we aim to exploit this shared subtask structure to increase the efficiency of demonstration-guided RL.
We propose Skill-based Learning with Demonstrations (SkiLD), an algorithm for demonstration-guided RL that efficiently leverages the provided demonstrations.
arXiv Detail & Related papers (2021-07-21T17:59:34Z) - Simplifying Deep Reinforcement Learning via Self-Supervision [51.2400839966489]
Self-Supervised Reinforcement Learning (SSRL) is a simple algorithm that optimize policies with purely supervised losses.
We show that SSRL is surprisingly competitive to contemporary algorithms with more stable performance and less running time.
arXiv Detail & Related papers (2021-06-10T06:29:59Z) - AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [84.94748183816547]
We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience.
Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.
arXiv Detail & Related papers (2020-06-16T17:54:41Z) - Learning Sparse Rewarded Tasks from Sub-Optimal Demonstrations [78.94386823185724]
Imitation learning learns effectively in sparse-rewarded tasks by leveraging the existing expert demonstrations.
In practice, collecting a sufficient amount of expert demonstrations can be prohibitively expensive.
We propose Self-Adaptive Learning (SAIL) that can achieve (near) optimal performance given only a limited number of sub-optimal demonstrations.
arXiv Detail & Related papers (2020-04-01T15:57:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.