Environment Generation for Zero-Shot Compositional Reinforcement
Learning
- URL: http://arxiv.org/abs/2201.08896v1
- Date: Fri, 21 Jan 2022 21:35:01 GMT
- Title: Environment Generation for Zero-Shot Compositional Reinforcement
Learning
- Authors: Izzeddin Gur, Natasha Jaques, Yingjie Miao, Jongwook Choi, Manoj
Tiwari, Honglak Lee, Aleksandra Faust
- Abstract summary: Compositional Design of Environments (CoDE) trains a Generator agent to automatically build a series of compositional tasks tailored to the agent's current skill level.
We learn to generate environments composed of multiple pages or rooms, and train RL agents capable of completing wide-range of complex tasks in those environments.
CoDE yields 4x higher success rate than the strongest baseline, and demonstrates strong performance of real websites learned on 3500 primitive tasks.
- Score: 105.35258025210862
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many real-world problems are compositional - solving them requires completing
interdependent sub-tasks, either in series or in parallel, that can be
represented as a dependency graph. Deep reinforcement learning (RL) agents
often struggle to learn such complex tasks due to the long time horizons and
sparse rewards. To address this problem, we present Compositional Design of
Environments (CoDE), which trains a Generator agent to automatically build a
series of compositional tasks tailored to the RL agent's current skill level.
This automatic curriculum not only enables the agent to learn more complex
tasks than it could have otherwise, but also selects tasks where the agent's
performance is weak, enhancing its robustness and ability to generalize
zero-shot to unseen tasks at test-time. We analyze why current environment
generation techniques are insufficient for the problem of generating
compositional tasks, and propose a new algorithm that addresses these issues.
Our results assess learning and generalization across multiple compositional
tasks, including the real-world problem of learning to navigate and interact
with web pages. We learn to generate environments composed of multiple pages or
rooms, and train RL agents capable of completing wide-range of complex tasks in
those environments. We contribute two new benchmark frameworks for generating
compositional tasks, compositional MiniGrid and gMiniWoB for web
navigation.CoDE yields 4x higher success rate than the strongest baseline, and
demonstrates strong performance of real websites learned on 3500 primitive
tasks.
Related papers
- TDAG: A Multi-Agent Framework based on Dynamic Task Decomposition and
Agent Generation [45.028795422801764]
We propose a multi-agent framework based on dynamic Task Decomposition and Agent Generation (TDAG)
This framework dynamically decomposes complex tasks into smaller subtasks and assigns each to a specifically generated subagent.
ItineraryBench is designed to assess agents' abilities in memory, planning, and tool usage across tasks of varying complexity.
arXiv Detail & Related papers (2024-02-15T18:27:37Z) - Fast Inference and Transfer of Compositional Task Structures for
Few-shot Task Generalization [101.72755769194677]
We formulate it as a few-shot reinforcement learning problem where a task is characterized by a subtask graph.
Our multi-task subtask graph inferencer (MTSGI) first infers the common high-level task structure in terms of the subtask graph from the training tasks.
Our experiment results on 2D grid-world and complex web navigation domains show that the proposed method can learn and leverage the common underlying structure of the tasks for faster adaptation to the unseen tasks.
arXiv Detail & Related papers (2022-05-25T10:44:25Z) - Multi-Task Learning with Sequence-Conditioned Transporter Networks [67.57293592529517]
We aim to solve multi-task learning through the lens of sequence-conditioning and weighted sampling.
We propose a new suite of benchmark aimed at compositional tasks, MultiRavens, which allows defining custom task combinations.
Second, we propose a vision-based end-to-end system architecture, Sequence-Conditioned Transporter Networks, which augments Goal-Conditioned Transporter Networks with sequence-conditioning and weighted sampling.
arXiv Detail & Related papers (2021-09-15T21:19:11Z) - One Network Fits All? Modular versus Monolithic Task Formulations in
Neural Networks [36.07011014271394]
We show that a single neural network is capable of simultaneously learning multiple tasks from a combined data set.
We study how the complexity of learning such combined tasks grows with the complexity of the task codes.
arXiv Detail & Related papers (2021-03-29T01:16:42Z) - Adversarial Environment Generation for Learning to Navigate the Web [107.99759923626242]
One of the bottlenecks of training web navigation agents is providing a learnable curriculum of training environments.
We propose using Adversarial Environment Generation (AEG) to generate challenging web environments in which to train reinforcement learning (RL) agents.
We show that the navigator agent trained with our proposed Flexible b-PAIRED technique significantly outperforms competitive automatic curriculum generation baselines.
arXiv Detail & Related papers (2021-03-02T19:19:30Z) - Meta Automatic Curriculum Learning [35.13646854355393]
We introduce the concept of Meta-ACL, and formalize it in the context of black-box RL learners.
We present AGAIN, a first instantiation of Meta-ACL, and showcase its benefits for curriculum generation over classical ACL.
arXiv Detail & Related papers (2020-11-16T14:56:42Z) - CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and
Transfer Learning [138.40338621974954]
CausalWorld is a benchmark for causal structure and transfer learning in a robotic manipulation environment.
Tasks consist of constructing 3D shapes from a given set of blocks - inspired by how children learn to build complex structures.
arXiv Detail & Related papers (2020-10-08T23:01:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.