Contrastive Initial State Buffer for Reinforcement Learning
- URL: http://arxiv.org/abs/2309.09752v3
- Date: Mon, 26 Feb 2024 10:22:04 GMT
- Title: Contrastive Initial State Buffer for Reinforcement Learning
- Authors: Nico Messikommer, Yunlong Song, Davide Scaramuzza
- Abstract summary: In Reinforcement Learning, the trade-off between exploration and exploitation poses a complex challenge for achieving efficient learning from limited samples.
We introduce the concept of a Contrastive Initial State Buffer, which strategically selects states from past experiences and uses them to initialize the agent in the environment.
We validate our approach on two complex robotic tasks without relying on any prior information about the environment.
- Score: 25.849626996870526
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In Reinforcement Learning, the trade-off between exploration and exploitation
poses a complex challenge for achieving efficient learning from limited
samples. While recent works have been effective in leveraging past experiences
for policy updates, they often overlook the potential of reusing past
experiences for data collection. Independent of the underlying RL algorithm, we
introduce the concept of a Contrastive Initial State Buffer, which
strategically selects states from past experiences and uses them to initialize
the agent in the environment in order to guide it toward more informative
states. We validate our approach on two complex robotic tasks without relying
on any prior information about the environment: (i) locomotion of a quadruped
robot traversing challenging terrains and (ii) a quadcopter drone racing
through a track. The experimental results show that our initial state buffer
achieves higher task performance than the nominal baseline while also speeding
up training convergence.
Related papers
- Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration [54.8229698058649]
We study how unlabeled prior trajectory data can be leveraged to learn efficient exploration strategies.
Our method SUPE (Skills from Unlabeled Prior data for Exploration) demonstrates that a careful combination of these ideas compounds their benefits.
We empirically show that SUPE reliably outperforms prior strategies, successfully solving a suite of long-horizon, sparse-reward tasks.
arXiv Detail & Related papers (2024-10-23T17:58:45Z) - Reverse Forward Curriculum Learning for Extreme Sample and Demonstration Efficiency in Reinforcement Learning [17.092640837991883]
Reinforcement learning (RL) presents a promising framework to learn policies through environment interaction.
One direction includes augmenting RL with offline data demonstrating desired tasks, but past work often require a lot of high-quality demonstration data.
We show how the combination of a reverse curriculum and forward curriculum in our method, RFCL, enables significant improvements in demonstration and sample efficiency.
arXiv Detail & Related papers (2024-05-06T11:33:12Z) - Rethinking Closed-loop Training for Autonomous Driving [82.61418945804544]
We present the first empirical study which analyzes the effects of different training benchmark designs on the success of learning agents.
We propose trajectory value learning (TRAVL), an RL-based driving agent that performs planning with multistep look-ahead.
Our experiments show that TRAVL can learn much faster and produce safer maneuvers compared to all the baselines.
arXiv Detail & Related papers (2023-06-27T17:58:39Z) - Demonstration-free Autonomous Reinforcement Learning via Implicit and
Bidirectional Curriculum [22.32327908453603]
We propose a demonstration-free reinforcement learning algorithm via Implicit and Bi-directional Curriculum (IBC)
With an auxiliary agent that is conditionally activated upon learning progress and a bidirectional goal curriculum based on optimal transport, our method outperforms previous methods.
arXiv Detail & Related papers (2023-05-17T04:31:36Z) - Don't Start From Scratch: Leveraging Prior Data to Automate Robotic
Reinforcement Learning [70.70104870417784]
Reinforcement learning (RL) algorithms hold the promise of enabling autonomous skill acquisition for robotic systems.
In practice, real-world robotic RL typically requires time consuming data collection and frequent human intervention to reset the environment.
In this work, we study how these challenges can be tackled by effective utilization of diverse offline datasets collected from previously seen tasks.
arXiv Detail & Related papers (2022-07-11T08:31:22Z) - Asynchronous Curriculum Experience Replay: A Deep Reinforcement Learning
Approach for UAV Autonomous Motion Control in Unknown Dynamic Environments [2.635402406262781]
Unmanned aerial vehicles (UAVs) have been widely used in military warfare.
We formulate the autonomous motion control (AMC) problem as a Markov decision process (MDP)
We propose an advanced deep reinforcement learning (DRL) method that allows UAVs to execute complex tasks in large-scale dynamic three-dimensional (3D) environments.
arXiv Detail & Related papers (2022-07-04T08:19:39Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Parrot: Data-Driven Behavioral Priors for Reinforcement Learning [79.32403825036792]
We propose a method for pre-training behavioral priors that can capture complex input-output relationships observed in successful trials.
We show how this learned prior can be used for rapidly learning new tasks without impeding the RL agent's ability to try out novel behaviors.
arXiv Detail & Related papers (2020-11-19T18:47:40Z) - ACDER: Augmented Curiosity-Driven Experience Replay [16.755555854030412]
We propose a novel method called Augmented Curiosity-Driven Experience Replay (ACDER)
ACDER uses a new goal-oriented curiosity-driven exploration to encourage the agent to pursue novel and task-relevant states more purposefully.
Experiments conducted on four challenging robotic manipulation tasks with binary rewards, including Reach, Push, Pick&Place and Multi-step Push.
arXiv Detail & Related papers (2020-11-16T15:27:15Z) - Batch Exploration with Examples for Scalable Robotic Reinforcement
Learning [63.552788688544254]
Batch Exploration with Examples (BEE) explores relevant regions of the state-space guided by a modest number of human provided images of important states.
BEE is able to tackle challenging vision-based manipulation tasks both in simulation and on a real Franka robot.
arXiv Detail & Related papers (2020-10-22T17:49:25Z) - AAMDRL: Augmented Asset Management with Deep Reinforcement Learning [5.801876281373619]
We show how Deep Reinforcement Learning can tackle this challenge.
Our contributions are threefold: (i) the use of contextual information also referred to as augmented state in DRL, (ii) the impact of a one period lag between observations and actions, and (iii) the implementation of a new repetitive train test method called walk forward analysis.
Although our experiment is on trading bots, it can easily be translated to other bot environments that operate in sequential environment with regime changes and noisy data.
arXiv Detail & Related papers (2020-09-30T03:55:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.