Leveraging Sequentiality in Reinforcement Learning from a Single
Demonstration
- URL: http://arxiv.org/abs/2211.04786v2
- Date: Mon, 17 Apr 2023 09:18:28 GMT
- Title: Leveraging Sequentiality in Reinforcement Learning from a Single
Demonstration
- Authors: Alexandre Chenu, Olivier Serris, Olivier Sigaud and Nicolas
Perrin-Gilbert
- Abstract summary: We propose to leverage a sequential bias to learn control policies for complex robotic tasks using a single demonstration.
We show that DCIL-II can solve with unprecedented sample efficiency some challenging simulated tasks such as humanoid locomotion and stand-up.
- Score: 68.94506047556412
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Reinforcement Learning has been successfully applied to learn robotic
control. However, the corresponding algorithms struggle when applied to
problems where the agent is only rewarded after achieving a complex task. In
this context, using demonstrations can significantly speed up the learning
process, but demonstrations can be costly to acquire. In this paper, we propose
to leverage a sequential bias to learn control policies for complex robotic
tasks using a single demonstration. To do so, our method learns a
goal-conditioned policy to control a system between successive low-dimensional
goals. This sequential goal-reaching approach raises a problem of compatibility
between successive goals: we need to ensure that the state resulting from
reaching a goal is compatible with the achievement of the following goals. To
tackle this problem, we present a new algorithm called DCIL-II. We show that
DCIL-II can solve with unprecedented sample efficiency some challenging
simulated tasks such as humanoid locomotion and stand-up as well as fast
running with a simulated Cassie robot. Our method leveraging sequentiality is a
step towards the resolution of complex robotic tasks under minimal
specification effort, a key feature for the next generation of autonomous
robots.
Related papers
- Single-Shot Learning of Stable Dynamical Systems for Long-Horizon Manipulation Tasks [48.54757719504994]
This paper focuses on improving task success rates while reducing the amount of training data needed.
Our approach introduces a novel method that segments long-horizon demonstrations into discrete steps defined by waypoints and subgoals.
We validate our approach through both simulation and real-world experiments, demonstrating effective transfer from simulation to physical robotic platforms.
arXiv Detail & Related papers (2024-10-01T19:49:56Z) - Large Language Models for Orchestrating Bimanual Robots [19.60907949776435]
We present LAnguage-model-based Bimanual ORchestration (LABOR) to analyze task configurations and devise coordination control policies.
We evaluate our method through simulated experiments involving two classes of long-horizon tasks using the NICOL humanoid robot.
arXiv Detail & Related papers (2024-04-02T15:08:35Z) - Single-Reset Divide & Conquer Imitation Learning [49.87201678501027]
Demonstrations are commonly used to speed up the learning process of Deep Reinforcement Learning algorithms.
Some algorithms have been developed to learn from a single demonstration.
arXiv Detail & Related papers (2024-02-14T17:59:47Z) - Stabilizing Contrastive RL: Techniques for Robotic Goal Reaching from
Offline Data [101.43350024175157]
Self-supervised learning has the potential to decrease the amount of human annotation and engineering effort required to learn control strategies.
Our work builds on prior work showing that the reinforcement learning (RL) itself can be cast as a self-supervised problem.
We demonstrate that a self-supervised RL algorithm based on contrastive learning can solve real-world, image-based robotic manipulation tasks.
arXiv Detail & Related papers (2023-06-06T01:36:56Z) - Automatic Goal Generation using Dynamical Distance Learning [5.797847756967884]
Reinforcement Learning (RL) agents can learn to solve complex sequential decision making tasks by interacting with the environment.
In the field of multi-goal RL, where agents are required to reach multiple goals to solve complex tasks, improving sample efficiency can be especially challenging.
We propose a method for automatic goal generation using a dynamical distance function (DDF) in a self-supervised fashion.
arXiv Detail & Related papers (2021-11-07T16:23:56Z) - Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors.
In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency.
We propose setting up an automatic curriculum for goals that the agent needs to solve.
We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z) - SQUIRL: Robust and Efficient Learning from Video Demonstration of
Long-Horizon Robotic Manipulation Tasks [8.756012472587601]
Deep reinforcement learning (RL) can be used to learn complex manipulation tasks.
RL requires the robot to collect a large amount of real-world experience.
S SQUIRL performs a new but related long-horizon task robustly given only a single video demonstration.
arXiv Detail & Related papers (2020-03-10T20:26:26Z) - Scalable Multi-Task Imitation Learning with Autonomous Improvement [159.9406205002599]
We build an imitation learning system that can continuously improve through autonomous data collection.
We leverage the robot's own trials as demonstrations for tasks other than the one that the robot actually attempted.
In contrast to prior imitation learning approaches, our method can autonomously collect data with sparse supervision for continuous improvement.
arXiv Detail & Related papers (2020-02-25T18:56:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.