Guided Exploration with Proximal Policy Optimization using a Single
Demonstration
- URL: http://arxiv.org/abs/2007.03328v2
- Date: Wed, 16 Jun 2021 21:38:35 GMT
- Title: Guided Exploration with Proximal Policy Optimization using a Single
Demonstration
- Authors: Gabriele Libardi and Gianni De Fabritiis
- Abstract summary: We train an agent on a combination of demonstrations and own experience to solve problems with variable initial conditions.
The agent is able to increase its performance and to tackle harder problems by replaying its own past trajectories.
To the best of our knowledge, learning a task in a three-dimensional environment with comparable difficulty has never been considered before using only one human demonstration.
- Score: 5.076419064097734
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Solving sparse reward tasks through exploration is one of the major
challenges in deep reinforcement learning, especially in three-dimensional,
partially-observable environments. Critically, the algorithm proposed in this
article uses a single human demonstration to solve hard-exploration problems.
We train an agent on a combination of demonstrations and own experience to
solve problems with variable initial conditions. We adapt this idea and
integrate it with the proximal policy optimization (PPO). The agent is able to
increase its performance and to tackle harder problems by replaying its own
past trajectories prioritizing them based on the obtained reward and the
maximum value of the trajectory. We compare different variations of this
algorithm to behavioral cloning on a set of hard-exploration tasks in the
Animal-AI Olympics environment. To the best of our knowledge, learning a task
in a three-dimensional environment with comparable difficulty has never been
considered before using only one human demonstration.
Related papers
- Deterministic Exploration via Stationary Bellman Error Maximization [6.474106100512158]
Exploration is a crucial and distinctive aspect of reinforcement learning (RL)
In this paper, we introduce three modifications to stabilize the latter and arrive at a deterministic exploration policy.
Our experimental results show that our approach can outperform $varepsilon$-greedy in dense and sparse reward settings.
arXiv Detail & Related papers (2024-10-31T11:46:48Z) - "Give Me an Example Like This": Episodic Active Reinforcement Learning from Demonstrations [3.637365301757111]
Methods like Reinforcement Learning from Expert Demonstrations (RLED) introduce external expert demonstrations to facilitate agent exploration during the learning process.
How to select the best set of human demonstrations that is most beneficial for learning becomes a major concern.
This paper presents EARLY, an algorithm that enables a learning agent to generate optimized queries of expert demonstrations in a trajectory-based feature space.
arXiv Detail & Related papers (2024-06-05T08:52:21Z) - Latent Exploration for Reinforcement Learning [87.42776741119653]
In Reinforcement Learning, agents learn policies by exploring and interacting with the environment.
We propose LATent TIme-Correlated Exploration (Lattice), a method to inject temporally-correlated noise into the latent state of the policy network.
arXiv Detail & Related papers (2023-05-31T17:40:43Z) - You Only Live Once: Single-Life Reinforcement Learning [124.1738675154651]
In many real-world situations, the goal might not be to learn a policy that can do the task repeatedly, but simply to perform a new task successfully once in a single trial.
We formalize this problem setting, where an agent must complete a task within a single episode without interventions.
We propose an algorithm, $Q$-weighted adversarial learning (QWALE), which employs a distribution matching strategy.
arXiv Detail & Related papers (2022-10-17T09:00:11Z) - A Memory-Related Multi-Task Method Based on Task-Agnostic Exploration [26.17597857264231]
In contrast to imitation learning, there is no expert data, only the data collected through environmental exploration.
Since the action sequence to solve the new task may be the combination of trajectory segments of multiple training tasks, the test task and the solving strategy do not exist directly in the training data.
We propose a Memory-related Multi-task Method (M3) to address this problem.
arXiv Detail & Related papers (2022-09-09T03:02:49Z) - Emergence of Novelty in Evolutionary Algorithms [0.0]
We introduce our approach to the maze problem and compare it to the previously proposed solution.
We find that our solution leads to an improved performance while being significantly simpler.
Building on that, we generalize the problem and apply our approach to a more advanced set of tasks, Atari Games.
arXiv Detail & Related papers (2022-06-27T13:49:41Z) - Guarantees for Epsilon-Greedy Reinforcement Learning with Function
Approximation [69.1524391595912]
Myopic exploration policies such as epsilon-greedy, softmax, or Gaussian noise fail to explore efficiently in some reinforcement learning tasks.
This paper presents a theoretical analysis of such policies and provides the first regret and sample-complexity bounds for reinforcement learning with myopic exploration.
arXiv Detail & Related papers (2022-06-19T14:44:40Z) - Learning to Shift Attention for Motion Generation [55.61994201686024]
One challenge of motion generation using robot learning from demonstration techniques is that human demonstrations follow a distribution with multiple modes for one task query.
Previous approaches fail to capture all modes or tend to average modes of the demonstrations and thus generate invalid trajectories.
We propose a motion generation model with extrapolation ability to overcome this problem.
arXiv Detail & Related papers (2021-02-24T09:07:52Z) - Batch Exploration with Examples for Scalable Robotic Reinforcement
Learning [63.552788688544254]
Batch Exploration with Examples (BEE) explores relevant regions of the state-space guided by a modest number of human provided images of important states.
BEE is able to tackle challenging vision-based manipulation tasks both in simulation and on a real Franka robot.
arXiv Detail & Related papers (2020-10-22T17:49:25Z) - Soft Hindsight Experience Replay [77.99182201815763]
Soft Hindsight Experience Replay (SHER) is a novel approach based on HER and Maximum Entropy Reinforcement Learning (MERL)
We evaluate SHER on Open AI Robotic manipulation tasks with sparse rewards.
arXiv Detail & Related papers (2020-02-06T03:57:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.