Thinking While Moving: Deep Reinforcement Learning with Concurrent
Control
- URL: http://arxiv.org/abs/2004.06089v4
- Date: Sat, 25 Apr 2020 21:19:45 GMT
- Title: Thinking While Moving: Deep Reinforcement Learning with Concurrent
Control
- Authors: Ted Xiao, Eric Jang, Dmitry Kalashnikov, Sergey Levine, Julian Ibarz,
Karol Hausman, Alexander Herzog
- Abstract summary: We study reinforcement learning in settings where sampling an action from the policy must be done concurrently with the time evolution of the controlled system.
Much like a person or an animal, the robot must think and move at the same time, deciding on its next action before the previous one has completed.
- Score: 122.49572467292293
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We study reinforcement learning in settings where sampling an action from the
policy must be done concurrently with the time evolution of the controlled
system, such as when a robot must decide on the next action while still
performing the previous action. Much like a person or an animal, the robot must
think and move at the same time, deciding on its next action before the
previous one has completed. In order to develop an algorithmic framework for
such concurrent control problems, we start with a continuous-time formulation
of the Bellman equations, and then discretize them in a way that is aware of
system delays. We instantiate this new class of approximate dynamic programming
methods via a simple architectural extension to existing value-based deep
reinforcement learning algorithms. We evaluate our methods on simulated
benchmark tasks and a large-scale robotic grasping task where the robot must
"think while moving".
Related papers
- Simulation-Aided Policy Tuning for Black-Box Robot Learning [47.83474891747279]
We present a novel black-box policy search algorithm focused on data-efficient policy improvements.
The algorithm learns directly on the robot and treats simulation as an additional information source to speed up the learning process.
We show fast and successful task learning on a robot manipulator with the aid of an imperfect simulator.
arXiv Detail & Related papers (2024-11-21T15:52:23Z) - Single-Shot Learning of Stable Dynamical Systems for Long-Horizon Manipulation Tasks [48.54757719504994]
This paper focuses on improving task success rates while reducing the amount of training data needed.
Our approach introduces a novel method that segments long-horizon demonstrations into discrete steps defined by waypoints and subgoals.
We validate our approach through both simulation and real-world experiments, demonstrating effective transfer from simulation to physical robotic platforms.
arXiv Detail & Related papers (2024-10-01T19:49:56Z) - Unsupervised Learning of Effective Actions in Robotics [0.9374652839580183]
Current state-of-the-art action representations in robotics lack proper effect-driven learning of the robot's actions.
We propose an unsupervised algorithm to discretize a continuous motion space and generate "action prototypes"
We evaluate our method on a simulated stair-climbing reinforcement learning task.
arXiv Detail & Related papers (2024-04-03T13:28:52Z) - RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation [68.70755196744533]
RoboGen is a generative robotic agent that automatically learns diverse robotic skills at scale via generative simulation.
Our work attempts to extract the extensive and versatile knowledge embedded in large-scale models and transfer them to the field of robotics.
arXiv Detail & Related papers (2023-11-02T17:59:21Z) - Leveraging Sequentiality in Reinforcement Learning from a Single
Demonstration [68.94506047556412]
We propose to leverage a sequential bias to learn control policies for complex robotic tasks using a single demonstration.
We show that DCIL-II can solve with unprecedented sample efficiency some challenging simulated tasks such as humanoid locomotion and stand-up.
arXiv Detail & Related papers (2022-11-09T10:28:40Z) - Memory-based gaze prediction in deep imitation learning for robot
manipulation [2.857551605623957]
The proposed algorithm uses a Transformer-based self-attention architecture for the gaze estimation based on sequential data to implement memory.
The proposed method was evaluated with a real robot multi-object manipulation task that requires memory of the previous states.
arXiv Detail & Related papers (2022-02-10T07:30:08Z) - Neural Dynamic Policies for End-to-End Sensorimotor Learning [51.24542903398335]
The current dominant paradigm in sensorimotor control, whether imitation or reinforcement learning, is to train policies directly in raw action spaces.
We propose Neural Dynamic Policies (NDPs) that make predictions in trajectory distribution space.
NDPs outperform the prior state-of-the-art in terms of either efficiency or performance across several robotic control tasks.
arXiv Detail & Related papers (2020-12-04T18:59:32Z) - DREAM Architecture: a Developmental Approach to Open-Ended Learning in
Robotics [44.62475518267084]
We present a developmental cognitive architecture to bootstrap this redescription process stage by stage, build new state representations with appropriate motivations, and transfer the acquired knowledge across domains or tasks or even across robots.
arXiv Detail & Related papers (2020-05-13T09:29:40Z) - On Simple Reactive Neural Networks for Behaviour-Based Reinforcement
Learning [5.482532589225552]
We present a behaviour-based reinforcement learning approach, inspired by Brook's subsumption architecture.
Our working assumption is that a pick and place robotic task can be simplified by leveraging domain knowledge of a robotics developer.
Our approach learns the pick and place task in 8,000 episodes, which represents a drastic reduction in the number of training episodes required by an end-to-end approach.
arXiv Detail & Related papers (2020-01-22T11:49:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.