Related papers: Cogito, Ergo Ludo: An Agent that Learns to Play by Reasoning and Planning

Cogito, Ergo Ludo: An Agent that Learns to Play by Reasoning and Planning

URL: http://arxiv.org/abs/2509.25052v1
Date: Mon, 29 Sep 2025 17:02:31 GMT
Title: Cogito, Ergo Ludo: An Agent that Learns to Play by Reasoning and Planning
Authors: Sai Wang, Yu Wu, Zhongwen Xu,
Abstract summary: We introduce Cogito, ergo ludo (CEL), a novel agent architecture that builds an explicit, language-based understanding of its environment's mechanics and its own strategy.<n>CEL operates on a cycle of interaction and reflection to perform two concurrent learning processes: Rule Induction and Strategy and Playbook Summarization.<n>We evaluate CEL on diverse grid-world tasks (i.e., Minesweeper, Frozen Lake, and Sokoban) and show that the CEL agent successfully learns to master these games by autonomously discovering their rules and developing effective policies from sparse rewards.
Score: 14.263118871262941
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The pursuit of artificial agents that can learn to master complex environments has led to remarkable successes, yet prevailing deep reinforcement learning methods often rely on immense experience, encoding their knowledge opaquely within neural network weights. We propose a different paradigm, one in which an agent learns to play by reasoning and planning. We introduce Cogito, ergo ludo (CEL), a novel agent architecture that leverages a Large Language Model (LLM) to build an explicit, language-based understanding of its environment's mechanics and its own strategy. Starting from a tabula rasa state with no prior knowledge (except action set), CEL operates on a cycle of interaction and reflection. After each episode, the agent analyzes its complete trajectory to perform two concurrent learning processes: Rule Induction, where it refines its explicit model of the environment's dynamics, and Strategy and Playbook Summarization, where it distills experiences into an actionable strategic playbook. We evaluate CEL on diverse grid-world tasks (i.e., Minesweeper, Frozen Lake, and Sokoban), and show that the CEL agent successfully learns to master these games by autonomously discovering their rules and developing effective policies from sparse rewards. Ablation studies confirm that the iterative process is critical for sustained learning. Our work demonstrates a path toward more general and interpretable agents that not only act effectively but also build a transparent and improving model of their world through explicit reasoning on raw experience.

Related papers

GIFT: Games as Informal Training for Generalizable LLMs [64.47890325824763]
Large Language Models (LLMs) struggle with "practical wisdom" and generalizable intelligence.<n>This gap arises from a lack of informal learning, which thrives on interactive feedback rather than goal-oriented instruction.<n>We propose treating Games as a primary environment for LLM informal learning, leveraging their intrinsic reward signals and abstracted complexity.
arXiv Detail & Related papers (2026-01-09T08:42:44Z)
Kolb-Based Experiential Learning for Generalist Agents with Human-Level Kaggle Data Science Performance [81.05882480184587]
We propose a computational framework of Kolb's learning cycle with Vygotsky's ZPD for autonomous agents.<n>Agent K is the 1st AI system to successfully integrate Kolb- and Vygotsky-inspired human cognitive learning.<n>With 9 gold, 8 silver, and 12 bronze medals level performance - including 4 gold and 4 silver on prize-awarding competitions - Agent K is the 1st AI system to successfully integrate Kolb- and Vygotsky-inspired human cognitive learning.
arXiv Detail & Related papers (2024-11-05T23:55:23Z)
Empowering Large Language Model Agents through Action Learning [85.39581419680755]
Large Language Model (LLM) Agents have recently garnered increasing interest yet they are limited in their ability to learn from trial and error. We argue that the capacity to learn new actions from experience is fundamental to the advancement of learning in LLM agents. We introduce a framework LearnAct with an iterative learning strategy to create and improve actions in the form of Python functions.
arXiv Detail & Related papers (2024-02-24T13:13:04Z)
Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning [50.47568731994238]
Key method for creating Artificial Intelligence (AI) agents is Reinforcement Learning (RL) This paper presents a general framework model for integrating and learning structured reasoning into AI agents' policies.
arXiv Detail & Related papers (2023-12-22T17:57:57Z)
Learning of Generalizable and Interpretable Knowledge in Grid-Based Reinforcement Learning Environments [5.217870815854702]
We propose using program synthesis to imitate reinforcement learning policies. We adapt the state-of-the-art program synthesis system DreamCoder for learning concepts in grid-based environments.
arXiv Detail & Related papers (2023-09-07T11:46:57Z)
Independent Learning in Stochastic Games [16.505046191280634]
We present the model of games for multi-agent learning in dynamic environments. We focus on the development of simple and independent learning dynamics for games. We present our recently proposed simple and independent learning dynamics that guarantee convergence in zero-sum games.
arXiv Detail & Related papers (2021-11-23T09:27:20Z)
Human-Level Reinforcement Learning through Theory-Based Modeling, Exploration, and Planning [27.593497502386143]
Theory-Based Reinforcement Learning uses human-like intuitive theories to explore and model an environment. We instantiate the approach in a video game playing agent called EMPA. EMPA matches human learning efficiency on a suite of 90 Atari-style video games.
arXiv Detail & Related papers (2021-07-27T01:38:13Z)
Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards. We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences. We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z)
Learning intuitive physics and one-shot imitation using state-action-prediction self-organizing maps [0.0]
Humans learn by exploration and imitation, build causal models of the world, and use both to flexibly solve new tasks. We suggest a simple but effective unsupervised model which develops such characteristics. We demonstrate its performance on a set of several related, but different one-shot imitation tasks, which the agent flexibly solves in an active inference style.
arXiv Detail & Related papers (2020-07-03T12:29:11Z)
Learning as Reinforcement: Applying Principles of Neuroscience for More General Reinforcement Learning Agents [1.0742675209112622]
We implement an architecture founded in principles of experimental neuroscience, by combining computationally efficient abstractions of biological algorithms. Our approach is inspired by research on spike-timing dependent plasticity, the transition between short and long term memory, and the role of various neurotransmitters in rewarding curiosity. The Neurons-in-a-Box architecture can learn in a wholly generalizable manner, and demonstrates an efficient way to build and apply representations without explicitly optimizing over a set of criteria or actions.
arXiv Detail & Related papers (2020-04-20T04:06:21Z)
Learning from Learners: Adapting Reinforcement Learning Agents to be Competitive in a Card Game [71.24825724518847]
We present a study on how popular reinforcement learning algorithms can be adapted to learn and to play a real-world implementation of a competitive multiplayer card game. We propose specific training and validation routines for the learning agents, in order to evaluate how the agents learn to be competitive and explain how they adapt to each others' playing style.
arXiv Detail & Related papers (2020-04-08T14:11:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.