Expressing and Exploiting the Common Subgoal Structure of Classical
Planning Domains Using Sketches: Extended Version
- URL: http://arxiv.org/abs/2105.04250v1
- Date: Mon, 10 May 2021 10:36:18 GMT
- Title: Expressing and Exploiting the Common Subgoal Structure of Classical
Planning Domains Using Sketches: Extended Version
- Authors: Dominik Drexler and Jendrik Seipp and Hector Geffner
- Abstract summary: We use a simple but powerful language for expressing problem decompositions introduced recently by Bonet and Geffner, called policy sketches.
A policy sketch R consists of a set of Boolean and numerical features and a set of sketch rules that express how the values of these features are supposed to change.
We show that many planning domains that cannot be solved by SIW are provably solvable in low time with the SIW_R algorithm.
- Score: 17.63517562327928
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Width-based planning methods exploit the use of conjunctive goals for
decomposing problems into subproblems of low width. However, algorithms like
SIW fail when the goal is not serializable. In this work, we address this
limitation of SIW by using a simple but powerful language for expressing
problem decompositions introduced recently by Bonet and Geffner, called policy
sketches. A policy sketch R consists of a set of Boolean and numerical features
and a set of sketch rules that express how the values of these features are
supposed to change. Like general policies, policy sketches are domain general,
but unlike policies, the changes captured by sketch rules do not need to be
achieved in a single step. We show that many planning domains that cannot be
solved by SIW are provably solvable in low polynomial time with the SIW_R
algorithm, the version of SIW that employs user-provided policy sketches.
Policy sketches are thus shown to be a powerful language for expressing
domain-specific knowledge in a simple and compact way and a convenient
alternative to languages such as HTNs or temporal logics. Furthermore, policy
sketches make it easy to express general problem decompositions and prove key
properties like their complexity and width.
Related papers
- General Policies, Subgoal Structure, and Planning Width [19.373790756767278]
It has been observed that planning domains with atomic goals can be solved by means of a simple exploration procedure, called IW, that runs in time exponential in the problem width.
Yet, there is no good explanation for why so many benchmark domains have bounded width when atomic goals are considered.
arXiv Detail & Related papers (2023-11-09T16:30:22Z) - Dimensionless Policies based on the Buckingham $\pi$ Theorem: Is This a
Good Way to Generalize Numerical Results? [66.52698983694613]
This article explores the use of the Buckingham $pi$ theorem as a tool to encode the control policies of physical systems into a generic form of knowledge.
We show, by restating the solution to a motion control problem using dimensionless variables, that (1) the policy mapping involves a reduced number of parameters and (2) control policies generated numerically for a specific system can be transferred exactly to a subset of dimensionally similar systems by scaling the input and output variables appropriately.
It remains to be seen how practical this approach can be to generalize policies for more complex high-dimensional problems, but the early results show that it is a
arXiv Detail & Related papers (2023-07-29T00:51:26Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - Efficient Policy Iteration for Robust Markov Decision Processes via
Regularization [49.05403412954533]
Robust decision processes (MDPs) provide a framework to model decision problems where the system dynamics are changing or only partially known.
Recent work established the equivalence between texttts rectangular $L_p$ robust MDPs and regularized MDPs, and derived a regularized policy iteration scheme that enjoys the same level of efficiency as standard MDPs.
In this work, we focus on the policy improvement step and derive concrete forms for the greedy policy and the optimal robust Bellman operators.
arXiv Detail & Related papers (2022-05-28T04:05:20Z) - Preliminary Results on Using Abstract AND-OR Graphs for Generalized
Solving of Stochastic Shortest Path Problems [25.152899734616298]
Shortest Path Problems (SSPs) are goal-oriented problems in the real-world.
A key difficulty for computing solutions for SSPs is finding solutions to even moderately sized problems intractable.
We show that our approach can be embedded in any SSP solver to compute hierarchically optimal policies.
arXiv Detail & Related papers (2022-04-08T21:30:47Z) - Learning Sketches for Decomposing Planning Problems into Subproblems of
Bounded Width: Extended Version [18.95007906887466]
Sketches have been introduced as a general language for representing the subgoal structure of instances drawn from the same domain.
We present a problem of learning sketches automatically given a planning domain, some instances of the target class of problems, and the desired bound on the sketch width.
The sketch learner and SIW_R planner yield a domain-independent planner that learns and exploits domain structure in a crisp and explicit form.
arXiv Detail & Related papers (2022-03-28T15:49:08Z) - Domain-Smoothing Network for Zero-Shot Sketch-Based Image Retrieval [66.37346493506737]
Zero-Shot Sketch-Based Image Retrieval (ZS-SBIR) is a novel cross-modal retrieval task.
We propose a novel Domain-Smoothing Network (DSN) for ZS-SBIR.
Our approach notably outperforms the state-of-the-art methods in both Sketchy and TU-Berlin datasets.
arXiv Detail & Related papers (2021-06-22T14:58:08Z) - Better than the Best: Gradient-based Improper Reinforcement Learning for
Network Scheduling [60.48359567964899]
We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay.
We use a policy gradient based reinforcement learning algorithm that produces a scheduler that performs better than the available atomic policies.
arXiv Detail & Related papers (2021-05-01T10:18:34Z) - Optimization Issues in KL-Constrained Approximate Policy Iteration [48.24321346619156]
Many reinforcement learning algorithms can be seen as versions of approximate policy iteration (API)
While standard API often performs poorly, it has been shown that learning can be stabilized by regularizing each policy update by the KL-divergence to the previous policy.
Popular practical algorithms such as TRPO, MPO, and VMPO replace regularization by a constraint on KL-divergence of consecutive policies.
arXiv Detail & Related papers (2021-02-11T19:35:33Z) - Learning General Policies from Small Examples Without Supervision [18.718037284357834]
Generalized planning is concerned with the computation of general policies that solve multiple instances of a planning domain all at once.
It has been recently shown that these policies can be computed in two steps: first, a suitable abstraction in the form of a qualitative numerical planning problem (QNP) is learned from sample plans.
In this work, we introduce an alternative approach for computing more expressive general policies which does not require sample plans or a QNP planner.
arXiv Detail & Related papers (2021-01-03T19:44:13Z) - General Policies, Serializations, and Planning Width [22.112881443209726]
We show that bounded width is a property of planning domains that admit optimal general policies in terms of features that are explicitly or implicitly represented in the domain encoding.
The study leads also to a new simple, meaningful, and expressive language for specifying domain serializations in the form of policy sketches.
arXiv Detail & Related papers (2020-12-15T01:33:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.