Related papers: Learning Game-Playing Agents with Generative Code Optimization

Learning Game-Playing Agents with Generative Code Optimization

URL: http://arxiv.org/abs/2508.19506v1
Date: Wed, 27 Aug 2025 01:30:20 GMT
Title: Learning Game-Playing Agents with Generative Code Optimization
Authors: Zhiyi Kuang, Ryan Rong, YuCheng Yuan, Allen Nie,
Abstract summary: We present a generative optimization approach for learning game-playing agents, where policies are represented as Python programs and refined using large language models (LLMs)<n>Our method treats decision-making policies as self-evolving code, with current observation as input and an in-game action as output, enabling agents to self-improve through execution traces and natural language feedback with minimal human intervention.
Score: 5.8375920147692115
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present a generative optimization approach for learning game-playing agents, where policies are represented as Python programs and refined using large language models (LLMs). Our method treats decision-making policies as self-evolving code, with current observation as input and an in-game action as output, enabling agents to self-improve through execution traces and natural language feedback with minimal human intervention. Applied to Atari games, our game-playing Python program achieves performance competitive with deep reinforcement learning (RL) baselines while using significantly less training time and much fewer environment interactions. This work highlights the promise of programmatic policy representations for building efficient, adaptable agents capable of complex, long-horizon reasoning.

Related papers

Policy-Conditioned Policies for Multi-Agent Task Solving [53.67744322553693]
In this work, we propose a paradigm shift that bridges the gap by representing policies as human-interpretable source code.<n>We reformulate the learning problem by utilizing Large Language Models (LLMs) as approximate interpreters.<n>We formalize this process as textitProgrammatic Iterated Best Response (PIBR), an algorithm where the policy code is optimized by textual gradients.
arXiv Detail & Related papers (2025-12-24T07:42:10Z)
Procedural Game Level Design with Deep Reinforcement Learning [0.0]
Procedural content generation (PCG) has become an increasingly popular technique in game development.<n>In this study, a novel method for procedural level design using Deep Reinforcement Learning (DRL) within a Unity-based 3D environment is proposed.
arXiv Detail & Related papers (2025-10-16T20:26:14Z)
ARPO:End-to-End Policy Optimization for GUI Agents with Experience Replay [88.74638385288773]
Agentic Replay Policy Optimization improves performance on complex, long-horizon computer tasks.<n>We propose a task selection strategy that filters tasks based on baseline agent performance.<n>Experiments on the OSWorld benchmark demonstrate that ARPO achieves competitive results.
arXiv Detail & Related papers (2025-05-22T06:24:32Z)
Agents Play Thousands of 3D Video Games [26.290863972751428]
We present PORTAL, a novel framework for developing artificial intelligence agents capable of playing thousands of 3D video games.<n>By transforming decision-making problems into language modeling tasks, our approach leverages large language models (LLMs) to generate behavior trees.<n>Our framework introduces a hybrid policy structure that combines rule-based nodes with neural network components, enabling both high-level strategic reasoning and precise low-level control.
arXiv Detail & Related papers (2025-03-17T16:42:34Z)
Autoverse: An Evolvable Game Language for Learning Robust Embodied Agents [2.624282086797512]
We introduce Autoverse, an evolvable, domain-specific language for single-player 2D grid-based games. We demonstrate its use as a scalable training ground for Open-Ended Learning (OEL) algorithms.
arXiv Detail & Related papers (2024-07-05T02:18:02Z)
STARLING: Self-supervised Training of Text-based Reinforcement Learning Agent with Large Language Models [5.786039929801102]
Existing environments for interactive fiction games are domain-specific or time-consuming to generate and do not train the RL agents to master a specific set of skills. We introduce an interactive environment for self-supervised RL, STARLING, for text-based games that bootstraps the text-based RL agents with automatically generated games to boost the performance and generalization capabilities to reach a goal of the target environment.
arXiv Detail & Related papers (2024-06-09T18:07:47Z)
Policy Learning with a Language Bottleneck [65.99843627646018]
We introduce Policy Learning with a Language Bottleneck (PLLB), a framework enabling AI agents to generate linguistic rules.<n>PLLBB alternates between a *rule generation* step guided by language models, and an *update* step where agents learn new policies guided by rules.<n>We show thatPLLB agents are able to learn more interpretable and generalizable behaviors, but can also share the learned rules with human users.
arXiv Detail & Related papers (2024-05-07T08:40:21Z)
Octopus: Embodied Vision-Language Programmer from Environmental Feedback [58.04529328728999]
Embodied vision-language models (VLMs) have achieved substantial progress in multimodal perception and reasoning. To bridge this gap, we introduce Octopus, an embodied vision-language programmer that uses executable code generation as a medium to connect planning and manipulation. Octopus is designed to 1) proficiently comprehend an agent's visual and textual task objectives, 2) formulate intricate action sequences, and 3) generate executable code.
arXiv Detail & Related papers (2023-10-12T17:59:58Z)
Learning of Generalizable and Interpretable Knowledge in Grid-Based Reinforcement Learning Environments [5.217870815854702]
We propose using program synthesis to imitate reinforcement learning policies. We adapt the state-of-the-art program synthesis system DreamCoder for learning concepts in grid-based environments.
arXiv Detail & Related papers (2023-09-07T11:46:57Z)
Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization [103.70896967077294]
This paper introduces a principled framework for reinforcing large language agents by learning a retrospective model. Our proposed agent architecture learns from rewards across multiple environments and tasks, for fine-tuning a pre-trained language model. Experimental results on various tasks demonstrate that the language agents improve over time.
arXiv Detail & Related papers (2023-08-04T06:14:23Z)
Policy Fusion for Adaptive and Customizable Reinforcement Learning Agents [137.86426963572214]
We show how to combine distinct behavioral policies to obtain a meaningful "fusion" policy. We propose four different policy fusion methods for combining pre-trained policies. We provide several practical examples and use-cases for how these methods are indeed useful for video game production and designers.
arXiv Detail & Related papers (2021-04-21T16:08:44Z)
On the interaction between supervision and self-play in emergent communication [82.290338507106]
We investigate the relationship between two categories of learning signals with the ultimate goal of improving sample efficiency. We find that first training agents via supervised learning on human data followed by self-play outperforms the converse.
arXiv Detail & Related papers (2020-02-04T02:35:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.