Learning Abstractions for Hierarchical Planning in Program-Synthesis Agents
- URL: http://arxiv.org/abs/2602.00929v1
- Date: Sat, 31 Jan 2026 23:01:51 GMT
- Title: Learning Abstractions for Hierarchical Planning in Program-Synthesis Agents
- Authors: Zergham Ahmed, Kazuki Irie, Joshua B. Tenenbaum, Christopher J. Bates, Samuel J. Gershman,
- Abstract summary: Humans learn abstractions and use them to plan efficiently to quickly generalize across tasks.<n>We introduce TheoryCoder-2, a new large language model (LLM) agent that actively learns reusable abstractions.<n>We conduct experiments on diverse environments, including BabyAI, Minihack and VGDL games like Sokoban.
- Score: 54.73952501784257
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Humans learn abstractions and use them to plan efficiently to quickly generalize across tasks -- an ability that remains challenging for state-of-the-art large language model (LLM) agents and deep reinforcement learning (RL) systems. Inspired by the cognitive science of how people form abstractions and intuitive theories of their world knowledge, Theory-Based RL (TBRL) systems, such as TheoryCoder, exhibit strong generalization through effective use of abstractions. However, they heavily rely on human-provided abstractions and sidestep the abstraction-learning problem. We introduce TheoryCoder-2, a new TBRL agent that leverages LLMs' in-context learning ability to actively learn reusable abstractions rather than relying on hand-specified ones, by synthesizing abstractions from experience and integrating them into a hierarchical planning process. We conduct experiments on diverse environments, including BabyAI, Minihack and VGDL games like Sokoban. We find that TheoryCoder-2 is significantly more sample-efficient than baseline LLM agents augmented with classical planning domain construction, reasoning-based planning, and prior program-synthesis agents such as WorldCoder. TheoryCoder-2 is able to solve complex tasks that the baselines fail, while only requiring minimal human prompts, unlike prior TBRL systems.
Related papers
- RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems [98.98963933669751]
We train models to be capable of proposing multiple abstractions given a problem, followed by RL that incentivizes building a solution.<n>This results in a two-player RL training paradigm, abbreviated as RLAD, that jointly trains an abstraction generator and a solution generator.<n>We show that allocating more test-time compute to generating abstractions is more beneficial for performance than generating more solutions at large test budgets.
arXiv Detail & Related papers (2025-10-02T17:44:23Z) - AR$^2$: Adversarial Reinforcement Learning for Abstract Reasoning in Large Language Models [12.484537674896908]
We propose AR$2$ (Adversarial Reinforcement Learning for Abstract Reasoning), a novel framework explicitly designed to enhance the abstraction abilities of large language models (LLMs)<n>AR$2$ employs a teacher model to transform kernel problems into narrative-rich, challenging descriptions without changing their fundamental logic.<n>A student coding model is trained to solve these complex narrative problems by extracting their underlying computational kernels.
arXiv Detail & Related papers (2025-08-27T17:26:44Z) - Synthesizing world models for bilevel planning [46.21010194281677]
Theory-based reinforcement learning (TBRL) is an algorithmic framework specifically designed to address this gap.<n>TBRL exploits hierarchical representations of theories and efficient program synthesis methods for more powerful learning and planning.<n>We demonstrate that this approach can be successfully applied to diverse and challenging grid-world games, where approaches based on directly synthesizing a policy perform poorly.
arXiv Detail & Related papers (2025-03-26T00:10:01Z) - General Intelligence Requires Reward-based Pretraining [14.057301560895505]
Large Language Models (LLMs) have demonstrated impressive real-world utility.<n>But their ability to reason adaptively and robustly remains fragile.<n>We propose disangling knowledge and reasoning through three key directions.
arXiv Detail & Related papers (2025-02-26T18:51:12Z) - Improving Complex Reasoning over Knowledge Graph with Logic-Aware Curriculum Tuning [89.89857766491475]
We propose a curriculum-based logical-aware instruction tuning framework, named LACT.<n>Specifically, we augment the arbitrary first-order logical queries via binary tree decomposition.<n> Experiments across widely used datasets demonstrate that LACT has substantial improvements(brings an average +5.5% MRR score) over advanced methods, achieving the new state-of-the-art.
arXiv Detail & Related papers (2024-05-02T18:12:08Z) - From Real World to Logic and Back: Learning Generalizable Relational Concepts For Long Horizon Robot Planning [16.115874470700113]
We present a method that enables robots to invent symbolic, relational concepts directly from a small number of raw, unsegmented, and unannotated demonstrations.<n>Our framework achieves performance on par with hand-engineered symbolic models, while scaling to execution horizons far beyond training.
arXiv Detail & Related papers (2024-02-19T06:28:21Z) - Building Minimal and Reusable Causal State Abstractions for
Reinforcement Learning [63.58935783293342]
Causal Bisimulation Modeling (CBM) is a method that learns the causal relationships in the dynamics and reward functions for each task to derive a minimal, task-specific abstraction.
CBM's learned implicit dynamics models identify the underlying causal relationships and state abstractions more accurately than explicit ones.
arXiv Detail & Related papers (2024-01-23T05:43:15Z) - A Theory of Abstraction in Reinforcement Learning [18.976500531441346]
In this dissertation, I present a theory of abstraction in reinforcement learning.
I first offer three desiderata for functions that carry out the process of abstraction.
I then present a suite of new algorithms and analysis that clarify how agents can learn to abstract according to these desiderata.
arXiv Detail & Related papers (2022-03-01T12:46:28Z) - Learning Dexterous Manipulation from Suboptimal Experts [69.8017067648129]
Relative Entropy Q-Learning (REQ) is a simple policy algorithm that combines ideas from successful offline and conventional RL algorithms.
We show how REQ is also effective for general off-policy RL, offline RL, and RL from demonstrations.
arXiv Detail & Related papers (2020-10-16T18:48:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.