ACDER: Augmented Curiosity-Driven Experience Replay
- URL: http://arxiv.org/abs/2011.08027v1
- Date: Mon, 16 Nov 2020 15:27:15 GMT
- Title: ACDER: Augmented Curiosity-Driven Experience Replay
- Authors: Boyao Li, Tao Lu, Jiayi Li, Ning Lu, Yinghao Cai, Shuo Wang
- Abstract summary: We propose a novel method called Augmented Curiosity-Driven Experience Replay (ACDER)
ACDER uses a new goal-oriented curiosity-driven exploration to encourage the agent to pursue novel and task-relevant states more purposefully.
Experiments conducted on four challenging robotic manipulation tasks with binary rewards, including Reach, Push, Pick&Place and Multi-step Push.
- Score: 16.755555854030412
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Exploration in environments with sparse feedback remains a challenging
research problem in reinforcement learning (RL). When the RL agent explores the
environment randomly, it results in low exploration efficiency, especially in
robotic manipulation tasks with high dimensional continuous state and action
space. In this paper, we propose a novel method, called Augmented
Curiosity-Driven Experience Replay (ACDER), which leverages (i) a new
goal-oriented curiosity-driven exploration to encourage the agent to pursue
novel and task-relevant states more purposefully and (ii) the dynamic initial
states selection as an automatic exploratory curriculum to further improve the
sample-efficiency. Our approach complements Hindsight Experience Replay (HER)
by introducing a new way to pursue valuable states. Experiments conducted on
four challenging robotic manipulation tasks with binary rewards, including
Reach, Push, Pick&Place and Multi-step Push. The empirical results show that
our proposed method significantly outperforms existing methods in the first
three basic tasks and also achieves satisfactory performance in multi-step
robotic task learning.
Related papers
- MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization [91.80034860399677]
Reinforcement learning algorithms aim to balance exploiting the current best strategy with exploring new options that could lead to higher rewards.
We introduce a framework, MaxInfoRL, for balancing intrinsic and extrinsic exploration.
We show that our approach achieves sublinear regret in the simplified setting of multi-armed bandits.
arXiv Detail & Related papers (2024-12-16T18:59:53Z) - Dense Dynamics-Aware Reward Synthesis: Integrating Prior Experience with Demonstrations [24.041217922654738]
Continuous control problems can be formulated as sparse-reward reinforcement learning (RL) tasks.
Online RL methods can automatically explore the state space to solve each new task.
However, discovering sequences of actions that lead to a non-zero reward becomes exponentially more difficult as the task horizon increases.
We introduce a systematic reward-shaping framework that distills the information contained in 1) a task-agnostic prior data set and 2) a small number of task-specific expert demonstrations.
arXiv Detail & Related papers (2024-12-02T04:37:12Z) - Random Latent Exploration for Deep Reinforcement Learning [71.88709402926415]
This paper introduces a new exploration technique called Random Latent Exploration (RLE)
RLE combines the strengths of bonus-based and noise-based (two popular approaches for effective exploration in deep RL) exploration strategies.
We evaluate it on the challenging Atari and IsaacGym benchmarks and show that RLE exhibits higher overall scores across all the tasks than other approaches.
arXiv Detail & Related papers (2024-07-18T17:55:22Z) - RILe: Reinforced Imitation Learning [60.63173816209543]
RILe is a framework that combines the strengths of imitation learning and inverse reinforcement learning to learn a dense reward function efficiently.
Our framework produces high-performing policies in high-dimensional tasks where direct imitation fails to replicate complex behaviors.
arXiv Detail & Related papers (2024-06-12T17:56:31Z) - Contrastive Initial State Buffer for Reinforcement Learning [25.849626996870526]
In Reinforcement Learning, the trade-off between exploration and exploitation poses a complex challenge for achieving efficient learning from limited samples.
We introduce the concept of a Contrastive Initial State Buffer, which strategically selects states from past experiences and uses them to initialize the agent in the environment.
We validate our approach on two complex robotic tasks without relying on any prior information about the environment.
arXiv Detail & Related papers (2023-09-18T13:26:40Z) - BeBold: Exploration Beyond the Boundary of Explored Regions [66.88415950549556]
In this paper, we propose the regulated difference of inverse visitation counts as a simple but effective criterion for intrinsic reward (IR)
The criterion helps the agent explore Beyond the Boundary of explored regions and mitigates common issues in count-based methods, such as short-sightedness and detachment.
The resulting method, BeBold, solves the 12 most challenging procedurally-generated tasks in MiniGrid with just 120M environment steps, without any curriculum learning.
arXiv Detail & Related papers (2020-12-15T21:26:54Z) - Batch Exploration with Examples for Scalable Robotic Reinforcement
Learning [63.552788688544254]
Batch Exploration with Examples (BEE) explores relevant regions of the state-space guided by a modest number of human provided images of important states.
BEE is able to tackle challenging vision-based manipulation tasks both in simulation and on a real Franka robot.
arXiv Detail & Related papers (2020-10-22T17:49:25Z) - Planning to Explore via Self-Supervised World Models [120.31359262226758]
Plan2Explore is a self-supervised reinforcement learning agent.
We present a new approach to self-supervised exploration and fast adaptation to new tasks.
Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods.
arXiv Detail & Related papers (2020-05-12T17:59:45Z) - RIDE: Rewarding Impact-Driven Exploration for Procedurally-Generated
Environments [15.736899098702972]
We propose a novel type of intrinsic reward which encourages the agent to take actions that lead to significant changes in its learned state representation.
We evaluate our method on multiple challenging procedurally-generated tasks in MiniGrid.
arXiv Detail & Related papers (2020-02-27T18:03:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.