Related papers: PCGRL+: Scaling, Control and Generalization in Reinforcement Learning Level Generators

PCGRL+: Scaling, Control and Generalization in Reinforcement Learning Level Generators

URL: http://arxiv.org/abs/2408.12525v1
Date: Thu, 22 Aug 2024 16:30:24 GMT
Title: PCGRL+: Scaling, Control and Generalization in Reinforcement Learning Level Generators
Authors: Sam Earle, Zehua Jiang, Julian Togelius,
Abstract summary: Procedural Content Generation via Reinforcement Learning (PCGRL) has been introduced as a means by which controllable designer agents can be trained. PCGRL offers a unique set of affordances for game designers, but it is constrained by the compute-intensive process of training RL agents. We implement several PCGRL environments in Jax so that all aspects of learning and simulation happen in parallel on the GPU.
Score: 2.334978724544296
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Procedural Content Generation via Reinforcement Learning (PCGRL) has been introduced as a means by which controllable designer agents can be trained based only on a set of computable metrics acting as a proxy for the level's quality and key characteristics. While PCGRL offers a unique set of affordances for game designers, it is constrained by the compute-intensive process of training RL agents, and has so far been limited to generating relatively small levels. To address this issue of scale, we implement several PCGRL environments in Jax so that all aspects of learning and simulation happen in parallel on the GPU, resulting in faster environment simulation; removing the CPU-GPU transfer of information bottleneck during RL training; and ultimately resulting in significantly improved training speed. We replicate several key results from prior works in this new framework, letting models train for much longer than previously studied, and evaluating their behavior after 1 billion timesteps. Aiming for greater control for human designers, we introduce randomized level sizes and frozen "pinpoints" of pivotal game tiles as further ways of countering overfitting. To test the generalization ability of learned generators, we evaluate models on large, out-of-distribution map sizes, and find that partial observation sizes learn more robust design strategies.

Related papers

Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining [74.83412846804977]
Reinforcement learning (RL)-based fine-tuning has become a crucial step in post-training language models. We present a systematic end-to-end study of RL fine-tuning for mathematical reasoning by training models entirely from scratch.
arXiv Detail & Related papers (2025-04-10T17:15:53Z)
EvoRL: A GPU-accelerated Framework for Evolutionary Reinforcement Learning [24.389896398264202]
We introduce $texttt$textbfEvoRL$$, the first end-to-end EvoRL framework optimized for GPU acceleration. The framework executes the entire training pipeline on accelerators, including environment simulations and EC processes.
arXiv Detail & Related papers (2025-01-25T08:31:07Z)
Accelerating Goal-Conditioned RL Algorithms and Research [17.155006770675904]
Self-supervised goal-conditioned reinforcement learning (GCRL) agents discover new behaviors by learning from the goals achieved during unstructured interaction with the environment. These methods have failed to see similar success due to a lack of data from slow environment simulations as well as a lack of stable algorithms. We release a benchmark (JaxGCRL) for self-supervised GCRL, enabling researchers to train agents for millions of environment steps in minutes on a single GPU.
arXiv Detail & Related papers (2024-08-20T17:58:40Z)
Stop Regressing: Training Value Functions via Classification for Scalable Deep RL [109.44370201929246]
We show that training value functions with categorical cross-entropy improves performance and scalability in a variety of domains. These include: single-task RL on Atari 2600 games with SoftMoEs, multi-task RL on Atari with large-scale ResNets, robotic manipulation with Q-transformers, playing Chess without search, and a language-agent Wordle task with high-capacity Transformers.
arXiv Detail & Related papers (2024-03-06T18:55:47Z)
Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches. This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z)
On Transforming Reinforcement Learning by Transformer: The Development Trajectory [97.79247023389445]
Transformer, originally devised for natural language processing, has also attested significant success in computer vision. We group existing developments in two categories: architecture enhancement and trajectory optimization. We examine the main applications of TRL in robotic manipulation, text-based games, navigation and autonomous driving.
arXiv Detail & Related papers (2022-12-29T03:15:59Z)
Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment. We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent. We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z)
Light-weight probing of unsupervised representations for Reinforcement Learning [20.638410483549706]
We study whether linear probing can be a proxy evaluation task for the quality of unsupervised RL representation. We show that the probing tasks are strongly rank correlated with the downstream RL performance on the Atari100k Benchmark. This provides a more efficient method for exploring the space of pretraining algorithms and identifying promising pretraining recipes.
arXiv Detail & Related papers (2022-08-25T21:08:01Z)
Model-Free Generative Replay for Lifelong Reinforcement Learning: Application to Starcraft-2 [5.239932780277599]
Generative replay (GR) is a biologically-inspired replay mechanism that augments learning experiences with self-labelled examples. We present a version of GR for LRL that satisfies two desideratas: (a) Introspective density modelling of the latent representations of policies learned using deep RL, and (b) Model-free end-to-end learning.
arXiv Detail & Related papers (2022-08-09T22:00:28Z)
RLFlow: Optimising Neural Network Subgraph Transformation with World Models [0.0]
We propose a model-based agent which learns to optimise the architecture of neural networks by performing a sequence of subgraph transformations to reduce model runtime. We show our approach can match the performance of state of the art on common convolutional networks and outperform those by up to 5% on transformer-style architectures.
arXiv Detail & Related papers (2022-05-03T11:52:54Z)
Multitask Adaptation by Retrospective Exploration with Learned World Models [77.34726150561087]
We propose a meta-learned addressing model called RAMa that provides training samples for the MBRL agent taken from task-agnostic storage. The model is trained to maximize the expected agent's performance by selecting promising trajectories solving prior tasks from the storage.
arXiv Detail & Related papers (2021-10-25T20:02:57Z)
Text Generation with Efficient (Soft) Q-Learning [91.47743595382758]
Reinforcement learning (RL) offers a more flexible solution by allowing users to plug in arbitrary task metrics as reward. We introduce a new RL formulation for text generation from the soft Q-learning perspective. We apply the approach to a wide range of tasks, including learning from noisy/negative examples, adversarial attacks, and prompt generation.
arXiv Detail & Related papers (2021-06-14T18:48:40Z)
RL-Scope: Cross-Stack Profiling for Deep Reinforcement Learning Workloads [4.575381867242508]
We propose RL-Scope, a cross-stack profiler that scopes low-level CPU/GPU resource usage to high-level algorithmic operations. We demonstrate RL-Scope's utility through in-depth case studies.
arXiv Detail & Related papers (2021-02-08T15:42:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.