Related papers: Learning Abstractions for Hierarchical Planning in Program-Synthesis Agents

Learning Abstractions for Hierarchical Planning in Program-Synthesis Agents

URL: http://arxiv.org/abs/2602.00929v1
Date: Sat, 31 Jan 2026 23:01:51 GMT
Title: Learning Abstractions for Hierarchical Planning in Program-Synthesis Agents
Authors: Zergham Ahmed, Kazuki Irie, Joshua B. Tenenbaum, Christopher J. Bates, Samuel J. Gershman,
Abstract summary: Humans learn abstractions and use them to plan efficiently to quickly generalize across tasks.<n>We introduce TheoryCoder-2, a new large language model (LLM) agent that actively learns reusable abstractions.<n>We conduct experiments on diverse environments, including BabyAI, Minihack and VGDL games like Sokoban.
Score: 54.73952501784257
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Humans learn abstractions and use them to plan efficiently to quickly generalize across tasks -- an ability that remains challenging for state-of-the-art large language model (LLM) agents and deep reinforcement learning (RL) systems. Inspired by the cognitive science of how people form abstractions and intuitive theories of their world knowledge, Theory-Based RL (TBRL) systems, such as TheoryCoder, exhibit strong generalization through effective use of abstractions. However, they heavily rely on human-provided abstractions and sidestep the abstraction-learning problem. We introduce TheoryCoder-2, a new TBRL agent that leverages LLMs' in-context learning ability to actively learn reusable abstractions rather than relying on hand-specified ones, by synthesizing abstractions from experience and integrating them into a hierarchical planning process. We conduct experiments on diverse environments, including BabyAI, Minihack and VGDL games like Sokoban. We find that TheoryCoder-2 is significantly more sample-efficient than baseline LLM agents augmented with classical planning domain construction, reasoning-based planning, and prior program-synthesis agents such as WorldCoder. TheoryCoder-2 is able to solve complex tasks that the baselines fail, while only requiring minimal human prompts, unlike prior TBRL systems.

Related papers

RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems [98.98963933669751]
We train models to be capable of proposing multiple abstractions given a problem, followed by RL that incentivizes building a solution.<n>This results in a two-player RL training paradigm, abbreviated as RLAD, that jointly trains an abstraction generator and a solution generator.<n>We show that allocating more test-time compute to generating abstractions is more beneficial for performance than generating more solutions at large test budgets.
arXiv Detail & Related papers (2025-10-02T17:44:23Z)
AR$^2$: Adversarial Reinforcement Learning for Abstract Reasoning in Large Language Models [12.484537674896908]
We propose AR$2$ (Adversarial Reinforcement Learning for Abstract Reasoning), a novel framework explicitly designed to enhance the abstraction abilities of large language models (LLMs)<n>AR$2$ employs a teacher model to transform kernel problems into narrative-rich, challenging descriptions without changing their fundamental logic.<n>A student coding model is trained to solve these complex narrative problems by extracting their underlying computational kernels.
arXiv Detail & Related papers (2025-08-27T17:26:44Z)
Synthesizing world models for bilevel planning [46.21010194281677]
Theory-based reinforcement learning (TBRL) is an algorithmic framework specifically designed to address this gap.<n>TBRL exploits hierarchical representations of theories and efficient program synthesis methods for more powerful learning and planning.<n>We demonstrate that this approach can be successfully applied to diverse and challenging grid-world games, where approaches based on directly synthesizing a policy perform poorly.
arXiv Detail & Related papers (2025-03-26T00:10:01Z)
General Intelligence Requires Reward-based Pretraining [14.057301560895505]
Large Language Models (LLMs) have demonstrated impressive real-world utility.<n>But their ability to reason adaptively and robustly remains fragile.<n>We propose disangling knowledge and reasoning through three key directions.
arXiv Detail & Related papers (2025-02-26T18:51:12Z)
Improving Complex Reasoning over Knowledge Graph with Logic-Aware Curriculum Tuning [89.89857766491475]
We propose a curriculum-based logical-aware instruction tuning framework, named LACT.<n>Specifically, we augment the arbitrary first-order logical queries via binary tree decomposition.<n> Experiments across widely used datasets demonstrate that LACT has substantial improvements(brings an average +5.5% MRR score) over advanced methods, achieving the new state-of-the-art.
arXiv Detail & Related papers (2024-05-02T18:12:08Z)
From Real World to Logic and Back: Learning Generalizable Relational Concepts For Long Horizon Robot Planning [16.115874470700113]
We present a method that enables robots to invent symbolic, relational concepts directly from a small number of raw, unsegmented, and unannotated demonstrations.<n>Our framework achieves performance on par with hand-engineered symbolic models, while scaling to execution horizons far beyond training.
arXiv Detail & Related papers (2024-02-19T06:28:21Z)
Building Minimal and Reusable Causal State Abstractions for Reinforcement Learning [63.58935783293342]
Causal Bisimulation Modeling (CBM) is a method that learns the causal relationships in the dynamics and reward functions for each task to derive a minimal, task-specific abstraction. CBM's learned implicit dynamics models identify the underlying causal relationships and state abstractions more accurately than explicit ones.
arXiv Detail & Related papers (2024-01-23T05:43:15Z)
A Theory of Abstraction in Reinforcement Learning [18.976500531441346]
In this dissertation, I present a theory of abstraction in reinforcement learning. I first offer three desiderata for functions that carry out the process of abstraction. I then present a suite of new algorithms and analysis that clarify how agents can learn to abstract according to these desiderata.
arXiv Detail & Related papers (2022-03-01T12:46:28Z)
Learning Dexterous Manipulation from Suboptimal Experts [69.8017067648129]
Relative Entropy Q-Learning (REQ) is a simple policy algorithm that combines ideas from successful offline and conventional RL algorithms. We show how REQ is also effective for general off-policy RL, offline RL, and RL from demonstrations.
arXiv Detail & Related papers (2020-10-16T18:48:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.