Abstract Demonstrations and Adaptive Exploration for Efficient and
Stable Multi-step Sparse Reward Reinforcement Learning
- URL: http://arxiv.org/abs/2207.09243v1
- Date: Tue, 19 Jul 2022 12:56:41 GMT
- Title: Abstract Demonstrations and Adaptive Exploration for Efficient and
Stable Multi-step Sparse Reward Reinforcement Learning
- Authors: Xintong Yang, Ze Ji, Jing Wu, Yu-kun Lai
- Abstract summary: This paper proposes a DRL exploration technique, termed A2, which integrates two components inspired by human experiences: Abstract demonstrations and Adaptive exploration.
A2 starts by decomposing a complex task into subtasks, and then provides the correct orders of subtasks to learn.
We demonstrate that A2 can aid popular DRL algorithms to learn more efficiently and stably in these environments.
- Score: 44.968170318777105
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Although Deep Reinforcement Learning (DRL) has been popular in many
disciplines including robotics, state-of-the-art DRL algorithms still struggle
to learn long-horizon, multi-step and sparse reward tasks, such as stacking
several blocks given only a task-completion reward signal. To improve learning
efficiency for such tasks, this paper proposes a DRL exploration technique,
termed A^2, which integrates two components inspired by human experiences:
Abstract demonstrations and Adaptive exploration. A^2 starts by decomposing a
complex task into subtasks, and then provides the correct orders of subtasks to
learn. During training, the agent explores the environment adaptively, acting
more deterministically for well-mastered subtasks and more stochastically for
ill-learnt subtasks. Ablation and comparative experiments are conducted on
several grid-world tasks and three robotic manipulation tasks. We demonstrate
that A^2 can aid popular DRL algorithms (DQN, DDPG, and SAC) to learn more
efficiently and stably in these environments.
Related papers
- Reinforcement Learning with Action Sequence for Data-Efficient Robot Learning [62.3886343725955]
We introduce a novel RL algorithm that learns a critic network that outputs Q-values over a sequence of actions.
By explicitly training the value functions to learn the consequence of executing a series of current and future actions, our algorithm allows for learning useful value functions from noisy trajectories.
arXiv Detail & Related papers (2024-11-19T01:23:52Z) - Sample Efficient Myopic Exploration Through Multitask Reinforcement
Learning with Diverse Tasks [53.44714413181162]
This paper shows that when an agent is trained on a sufficiently diverse set of tasks, a generic policy-sharing algorithm with myopic exploration design can be sample-efficient.
To the best of our knowledge, this is the first theoretical demonstration of the "exploration benefits" of MTRL.
arXiv Detail & Related papers (2024-03-03T22:57:44Z) - Primitive Skill-based Robot Learning from Human Evaluative Feedback [28.046559859978597]
Reinforcement learning algorithms face challenges when dealing with long-horizon robot manipulation tasks in real-world environments.
We propose a novel framework, SEED, which leverages two approaches: reinforcement learning from human feedback (RLHF) and primitive skill-based reinforcement learning.
Our results show that SEED significantly outperforms state-of-the-art RL algorithms in sample efficiency and safety.
arXiv Detail & Related papers (2023-07-28T20:48:30Z) - LDSA: Learning Dynamic Subtask Assignment in Cooperative Multi-Agent
Reinforcement Learning [122.47938710284784]
We propose a novel framework for learning dynamic subtask assignment (LDSA) in cooperative MARL.
To reasonably assign agents to different subtasks, we propose an ability-based subtask selection strategy.
We show that LDSA learns reasonable and effective subtask assignment for better collaboration.
arXiv Detail & Related papers (2022-05-05T10:46:16Z) - Accelerating Robotic Reinforcement Learning via Parameterized Action
Primitives [92.0321404272942]
Reinforcement learning can be used to build general-purpose robotic systems.
However, training RL agents to solve robotics tasks still remains challenging.
In this work, we manually specify a library of robot action primitives (RAPS), parameterized with arguments that are learned by an RL policy.
We find that our simple change to the action interface substantially improves both the learning efficiency and task performance.
arXiv Detail & Related papers (2021-10-28T17:59:30Z) - PBCS : Efficient Exploration and Exploitation Using a Synergy between
Reinforcement Learning and Motion Planning [8.176152440971897]
"Plan, Backplay, Chain Skills" combines motion planning and reinforcement learning to solve hard exploration environments.
We show that this method outperforms state-of-the-art RL algorithms in 2D maze environments of various sizes.
arXiv Detail & Related papers (2020-04-24T11:37:09Z) - Trying AGAIN instead of Trying Longer: Prior Learning for Automatic
Curriculum Learning [39.489869446313065]
A major challenge in the Deep RL (DRL) community is to train agents able to generalize over unseen situations.
We propose a two stage ACL approach where 1) a teacher algorithm first learns to train a DRL agent with a high-exploration curriculum, and then 2) distills learned priors from the first run to generate an "expert curriculum"
Besides demonstrating 50% improvements on average over the current state of the art, the objective of this work is to give a first example of a new research direction oriented towards refining ACL techniques over multiple learners.
arXiv Detail & Related papers (2020-04-07T07:30:27Z) - Meta Reinforcement Learning with Autonomous Inference of Subtask
Dependencies [57.27944046925876]
We propose and address a novel few-shot RL problem, where a task is characterized by a subtask graph.
Instead of directly learning a meta-policy, we develop a Meta-learner with Subtask Graph Inference.
Our experiment results on two grid-world domains and StarCraft II environments show that the proposed method is able to accurately infer the latent task parameter.
arXiv Detail & Related papers (2020-01-01T17:34:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.