Learning Sketch Decompositions in Planning via Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2412.08574v1
- Date: Wed, 11 Dec 2024 17:45:31 GMT
- Title: Learning Sketch Decompositions in Planning via Deep Reinforcement Learning
- Authors: Michael Aichmüller, Hector Geffner,
- Abstract summary: In planning and reinforcement learning, the identification of common subgoal structures across problems is important.
These sketches split problems into subproblems which then become solvable in low time by a greedy sequence of IW$(k)$ searches.
- Score: 10.52014836549529
- License:
- Abstract: In planning and reinforcement learning, the identification of common subgoal structures across problems is important when goals are to be achieved over long horizons. Recently, it has been shown that such structures can be expressed as feature-based rules, called sketches, over a number of classical planning domains. These sketches split problems into subproblems which then become solvable in low polynomial time by a greedy sequence of IW$(k)$ searches. Methods for learning sketches using feature pools and min-SAT solvers have been developed, yet they face two key limitations: scalability and expressivity. In this work, we address these limitations by formulating the problem of learning sketch decompositions as a deep reinforcement learning (DRL) task, where general policies are sought in a modified planning problem where the successor states of a state s are defined as those reachable from s through an IW$(k)$ search. The sketch decompositions obtained through this method are experimentally evaluated across various domains, and problems are regarded as solved by the decomposition when the goal is reached through a greedy sequence of IW$(k)$ searches. While our DRL approach for learning sketch decompositions does not yield interpretable sketches in the form of rules, we demonstrate that the resulting decompositions can often be understood in a crisp manner.
Related papers
- Learning Hidden Subgoals under Temporal Ordering Constraints in Reinforcement Learning [14.46490764849977]
We propose a novel RL algorithm for bf l hidden bf subgoals under bf temporal bf ordering bf constraints (LSTOC)
We propose a new contrastive learning objective which can effectively learn hidden subgoals and their temporal orderings at the same time.
arXiv Detail & Related papers (2024-11-03T03:22:39Z) - An Option-Dependent Analysis of Regret Minimization Algorithms in
Finite-Horizon Semi-Markov Decision Processes [47.037877670620524]
We present an option-dependent upper bound to the regret suffered by regret minimization algorithms in finite-horizon problems.
We illustrate that the performance improvement derives from the planning horizon reduction induced by the temporal abstraction enforced by the hierarchical structure.
arXiv Detail & Related papers (2023-05-10T15:00:05Z) - Learning Sketches for Decomposing Planning Problems into Subproblems of
Bounded Width: Extended Version [18.95007906887466]
Sketches have been introduced as a general language for representing the subgoal structure of instances drawn from the same domain.
We present a problem of learning sketches automatically given a planning domain, some instances of the target class of problems, and the desired bound on the sketch width.
The sketch learner and SIW_R planner yield a domain-independent planner that learns and exploits domain structure in a crisp and explicit form.
arXiv Detail & Related papers (2022-03-28T15:49:08Z) - Provable Reinforcement Learning with a Short-Term Memory [68.00677878812908]
We study a new subclass of POMDPs, whose latent states can be decoded by the most recent history of a short length $m$.
In particular, in the rich-observation setting, we develop new algorithms using a novel "moment matching" approach with a sample complexity that scales exponentially.
Our results show that a short-term memory suffices for reinforcement learning in these environments.
arXiv Detail & Related papers (2022-02-08T16:39:57Z) - Regret Analysis in Deterministic Reinforcement Learning [78.31410227443102]
We study the problem of regret, which is central to the analysis and design of optimal learning algorithms.
We present logarithmic problem-specific regret lower bounds that explicitly depend on the system parameter.
arXiv Detail & Related papers (2021-06-27T23:41:57Z) - Expressing and Exploiting the Common Subgoal Structure of Classical
Planning Domains Using Sketches: Extended Version [17.63517562327928]
We use a simple but powerful language for expressing problem decompositions introduced recently by Bonet and Geffner, called policy sketches.
A policy sketch R consists of a set of Boolean and numerical features and a set of sketch rules that express how the values of these features are supposed to change.
We show that many planning domains that cannot be solved by SIW are provably solvable in low time with the SIW_R algorithm.
arXiv Detail & Related papers (2021-05-10T10:36:18Z) - An Integer Linear Programming Framework for Mining Constraints from Data [81.60135973848125]
We present a general framework for mining constraints from data.
In particular, we consider the inference in structured output prediction as an integer linear programming (ILP) problem.
We show that our approach can learn to solve 9x9 Sudoku puzzles and minimal spanning tree problems from examples without providing the underlying rules.
arXiv Detail & Related papers (2020-06-18T20:09:53Z) - Learning What to Defer for Maximum Independent Sets [84.00112106334655]
We propose a novel DRL scheme, coined learning what to defer (LwD), where the agent adaptively shrinks or stretch the number of stages by learning to distribute the element-wise decisions of the solution at each stage.
We apply the proposed framework to the maximum independent set (MIS) problem, and demonstrate its significant improvement over the current state-of-the-art DRL scheme.
arXiv Detail & Related papers (2020-06-17T02:19:31Z) - STRIPS Action Discovery [67.73368413278631]
Recent approaches have shown the success of classical planning at synthesizing action models even when all intermediate states are missing.
We propose a new algorithm to unsupervisedly synthesize STRIPS action models with a classical planner when action signatures are unknown.
arXiv Detail & Related papers (2020-01-30T17:08:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.