Enhancing the Hierarchical Environment Design via Generative Trajectory
Modeling
- URL: http://arxiv.org/abs/2310.00301v2
- Date: Thu, 15 Feb 2024 07:12:14 GMT
- Title: Enhancing the Hierarchical Environment Design via Generative Trajectory
Modeling
- Authors: Dexun Li, Pradeep Varakantham
- Abstract summary: We introduce a hierarchical MDP framework for environment design under resource constraints.
It consists of an upper-level RL teacher agent that generates suitable training environments for a lower-level student agent.
Our proposed method significantly reduces the resource-intensive interactions between agents and environments.
- Score: 8.256433006393243
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised Environment Design (UED) is a paradigm for automatically
generating a curriculum of training environments, enabling agents trained in
these environments to develop general capabilities, i.e., achieving good
zero-shot transfer performance. However, existing UED approaches focus
primarily on the random generation of environments for open-ended agent
training. This is impractical in scenarios with limited resources, such as the
constraints on the number of generated environments. In this paper, we
introduce a hierarchical MDP framework for environment design under resource
constraints. It consists of an upper-level RL teacher agent that generates
suitable training environments for a lower-level student agent. The RL teacher
can leverage previously discovered environment structures and generate
environments at the frontier of the student's capabilities by observing the
student policy's representation. Moreover, to reduce the time-consuming
collection of experiences for the upper-level teacher, we utilize recent
advances in generative modeling to synthesize a trajectory dataset to train the
teacher agent. Our proposed method significantly reduces the resource-intensive
interactions between agents and environments and empirical experiments across
various domains demonstrate the effectiveness of our approach.
Related papers
- Adversarial Environment Design via Regret-Guided Diffusion Models [13.651184780336623]
Training agents that are robust to environmental changes remains a significant challenge in deep reinforcement learning (RL)
Unsupervised environment design (UED) has recently emerged to address this issue by generating a set of training environments tailored to the agent's capabilities.
We propose a novel UED algorithm, adversarial environment design via regret-guided diffusion models (ADD)
arXiv Detail & Related papers (2024-10-25T17:35:03Z) - DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback [62.235925602004535]
We introduce DataEnvGym, a testbed of teacher environments for data generation agents.
DataEnvGym frames data generation as a sequential decision-making task.
Agent's goal is to improve student performance.
We support 3 diverse tasks (math, code, and VQA) and test multiple students and teachers.
arXiv Detail & Related papers (2024-10-08T17:20:37Z) - Learning Curricula in Open-Ended Worlds [17.138779075998084]
This thesis develops a class of methods called Unsupervised Environment Design (UED)
Given an environment design space, UED automatically generates an infinite sequence or curriculum of training environments.
The findings in this thesis show that UED autocurricula can produce RL agents exhibiting significantly improved robustness.
arXiv Detail & Related papers (2023-12-03T16:44:00Z) - Stabilizing Unsupervised Environment Design with a Learned Adversary [28.426666219969555]
Key challenge in training generally-capable agents is the design of training tasks that facilitate broad generalization and robustness to environment variations.
A pioneering approach for Unsupervised Environment Design (UED) is PAIRED, which uses reinforcement learning to train a teacher policy to design tasks from scratch.
Despite its strong theoretical backing, PAIRED suffers from a variety of challenges that hinder its practical performance.
We make it possible for PAIRED to match or exceed state-of-the-art methods, producing robust agents in several established challenging procedurally-generated environments.
arXiv Detail & Related papers (2023-08-21T15:42:56Z) - Reparameterized Policy Learning for Multimodal Trajectory Optimization [61.13228961771765]
We investigate the challenge of parametrizing policies for reinforcement learning in high-dimensional continuous action spaces.
We propose a principled framework that models the continuous RL policy as a generative model of optimal trajectories.
We present a practical model-based RL method, which leverages the multimodal policy parameterization and learned world model.
arXiv Detail & Related papers (2023-07-20T09:05:46Z) - Diversity Induced Environment Design via Self-Play [9.172096093540357]
We propose a task-agnostic method to identify observed/hidden states that are representative of a given level.
The outcome of this method is then utilized to characterize the diversity between two levels, which as we show can be crucial to effective performance.
In addition, to improve sampling efficiency, we incorporate the self-play technique that allows the environment generator to automatically generate environments that are of great benefit to the training agent.
arXiv Detail & Related papers (2023-02-04T07:31:36Z) - Scenic4RL: Programmatic Modeling and Generation of Reinforcement
Learning Environments [89.04823188871906]
Generation of diverse realistic scenarios is challenging for real-time strategy (RTS) environments.
Most of the existing simulators rely on randomly generating the environments.
We introduce the benefits of adopting an existing formal scenario specification language, SCENIC, to assist researchers.
arXiv Detail & Related papers (2021-06-18T21:49:46Z) - Emergent Complexity and Zero-shot Transfer via Unsupervised Environment
Design [121.73425076217471]
We propose Unsupervised Environment Design (UED), where developers provide environments with unknown parameters, and these parameters are used to automatically produce a distribution over valid, solvable environments.
We call our technique Protagonist Antagonist Induced Regret Environment Design (PAIRED)
Our experiments demonstrate that PAIRED produces a natural curriculum of increasingly complex environments, and PAIRED agents achieve higher zero-shot transfer performance when tested in highly novel environments.
arXiv Detail & Related papers (2020-12-03T17:37:01Z) - Environment Shaping in Reinforcement Learning using State Abstraction [63.444831173608605]
We propose a novel framework of emphenvironment shaping using state abstraction.
Our key idea is to compress the environment's large state space with noisy signals to an abstracted space.
We show that the agent's policy learnt in the shaped environment preserves near-optimal behavior in the original environment.
arXiv Detail & Related papers (2020-06-23T17:00:22Z) - Ecological Reinforcement Learning [76.9893572776141]
We study the kinds of environment properties that can make learning under such conditions easier.
understanding how properties of the environment impact the performance of reinforcement learning agents can help us to structure our tasks in ways that make learning tractable.
arXiv Detail & Related papers (2020-06-22T17:55:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.