Learning Compositional Neural Programs for Continuous Control
- URL: http://arxiv.org/abs/2007.13363v2
- Date: Tue, 13 Apr 2021 12:08:39 GMT
- Title: Learning Compositional Neural Programs for Continuous Control
- Authors: Thomas Pierrot, Nicolas Perrin, Feryal Behbahani, Alexandre Laterre,
Olivier Sigaud, Karim Beguir, Nando de Freitas
- Abstract summary: We propose a novel solution to challenging sparse-reward, continuous control problems.
Our solution, dubbed AlphaNPI-X, involves three separate stages of learning.
We empirically show that AlphaNPI-X can effectively learn to tackle challenging sparse manipulation tasks.
- Score: 62.80551956557359
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel solution to challenging sparse-reward, continuous control
problems that require hierarchical planning at multiple levels of abstraction.
Our solution, dubbed AlphaNPI-X, involves three separate stages of learning.
First, we use off-policy reinforcement learning algorithms with experience
replay to learn a set of atomic goal-conditioned policies, which can be easily
repurposed for many tasks. Second, we learn self-models describing the effect
of the atomic policies on the environment. Third, the self-models are harnessed
to learn recursive compositional programs with multiple levels of abstraction.
The key insight is that the self-models enable planning by imagination,
obviating the need for interaction with the world when learning higher-level
compositional programs. To accomplish the third stage of learning, we extend
the AlphaNPI algorithm, which applies AlphaZero to learn recursive neural
programmer-interpreters. We empirically show that AlphaNPI-X can effectively
learn to tackle challenging sparse manipulation tasks, such as stacking
multiple blocks, where powerful model-free baselines fail.
Related papers
- A Unified Framework for Neural Computation and Learning Over Time [56.44910327178975]
Hamiltonian Learning is a novel unified framework for learning with neural networks "over time"
It is based on differential equations that: (i) can be integrated without the need of external software solvers; (ii) generalize the well-established notion of gradient-based learning in feed-forward and recurrent networks; (iii) open to novel perspectives.
arXiv Detail & Related papers (2024-09-18T14:57:13Z) - KnowPC: Knowledge-Driven Programmatic Reinforcement Learning for Zero-shot Coordination [11.203441390685201]
Zero-shot coordination (ZSC) remains a major challenge in the cooperative AI field.
We introduce Knowledge-driven Programmatic reinforcement learning for ZSC.
A significant challenge is the vast program search space, making it difficult to find high-performing programs efficiently.
arXiv Detail & Related papers (2024-08-08T09:43:54Z) - Hierarchically Structured Task-Agnostic Continual Learning [0.0]
We take a task-agnostic view of continual learning and develop a hierarchical information-theoretic optimality principle.
We propose a neural network layer, called the Mixture-of-Variational-Experts layer, that alleviates forgetting by creating a set of information processing paths.
Our approach can operate in a task-agnostic way, i.e., it does not require task-specific knowledge, as is the case with many existing continual learning algorithms.
arXiv Detail & Related papers (2022-11-14T19:53:15Z) - Active Predictive Coding: A Unified Neural Framework for Learning
Hierarchical World Models for Perception and Planning [1.3535770763481902]
We propose a new framework for predictive coding called active predictive coding.
It can learn hierarchical world models and solve two radically different open problems in AI.
arXiv Detail & Related papers (2022-10-23T05:44:22Z) - Learning Neuro-Symbolic Skills for Bilevel Planning [63.388694268198655]
Decision-making is challenging in robotics environments with continuous object-centric states, continuous actions, long horizons, and sparse feedback.
Hierarchical approaches, such as task and motion planning (TAMP), address these challenges by decomposing decision-making into two or more levels of abstraction.
Our main contribution is a method for learning parameterized polices in combination with operators and samplers.
arXiv Detail & Related papers (2022-06-21T19:01:19Z) - Learning to Synthesize Programs as Interpretable and Generalizable
Policies [25.258598215642067]
We present a framework that learns to synthesize a program, which details the procedure to solve a task in a flexible and expressive manner.
Experimental results demonstrate that the proposed framework not only learns to reliably synthesize task-solving programs but also outperforms DRL and program synthesis baselines.
arXiv Detail & Related papers (2021-08-31T07:03:06Z) - Reset-Free Reinforcement Learning via Multi-Task Learning: Learning
Dexterous Manipulation Behaviors without Human Intervention [67.1936055742498]
We show that multi-task learning can effectively scale reset-free learning schemes to much more complex problems.
This work shows the ability to learn dexterous manipulation behaviors in the real world with RL without any human intervention.
arXiv Detail & Related papers (2021-04-22T17:38:27Z) - Episodic Self-Imitation Learning with Hindsight [7.743320290728377]
Episodic self-imitation learning is a novel self-imitation algorithm with a trajectory selection module and an adaptive loss function.
A selection module is introduced to filter uninformative samples from each episode of the update.
Episodic self-imitation learning has the potential to be applied to real-world problems that have continuous action spaces.
arXiv Detail & Related papers (2020-11-26T20:36:42Z) - Deep Imitation Learning for Bimanual Robotic Manipulation [70.56142804957187]
We present a deep imitation learning framework for robotic bimanual manipulation.
A core challenge is to generalize the manipulation skills to objects in different locations.
We propose to (i) decompose the multi-modal dynamics into elemental movement primitives, (ii) parameterize each primitive using a recurrent graph neural network to capture interactions, and (iii) integrate a high-level planner that composes primitives sequentially and a low-level controller to combine primitive dynamics and inverse kinematics control.
arXiv Detail & Related papers (2020-10-11T01:40:03Z) - SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep
Reinforcement Learning [102.78958681141577]
We present SUNRISE, a simple unified ensemble method, which is compatible with various off-policy deep reinforcement learning algorithms.
SUNRISE integrates two key ingredients: (a) ensemble-based weighted Bellman backups, which re-weight target Q-values based on uncertainty estimates from a Q-ensemble, and (b) an inference method that selects actions using the highest upper-confidence bounds for efficient exploration.
arXiv Detail & Related papers (2020-07-09T17:08:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.