Learning General Policies from Small Examples Without Supervision
- URL: http://arxiv.org/abs/2101.00692v2
- Date: Wed, 17 Feb 2021 19:52:39 GMT
- Title: Learning General Policies from Small Examples Without Supervision
- Authors: Guillem Franc\`es, Blai Bonet, Hector Geffner
- Abstract summary: Generalized planning is concerned with the computation of general policies that solve multiple instances of a planning domain all at once.
It has been recently shown that these policies can be computed in two steps: first, a suitable abstraction in the form of a qualitative numerical planning problem (QNP) is learned from sample plans.
In this work, we introduce an alternative approach for computing more expressive general policies which does not require sample plans or a QNP planner.
- Score: 18.718037284357834
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generalized planning is concerned with the computation of general policies
that solve multiple instances of a planning domain all at once. It has been
recently shown that these policies can be computed in two steps: first, a
suitable abstraction in the form of a qualitative numerical planning problem
(QNP) is learned from sample plans, then the general policies are obtained from
the learned QNP using a planner. In this work, we introduce an alternative
approach for computing more expressive general policies which does not require
sample plans or a QNP planner. The new formulation is very simple and can be
cast in terms that are more standard in machine learning: a large but finite
pool of features is defined from the predicates in the planning examples using
a general grammar, and a small subset of features is sought for separating
"good" from "bad" state transitions, and goals from non-goals. The problems of
finding such a "separating surface" while labeling the transitions as "good" or
"bad" are jointly addressed as a single combinatorial optimization problem
expressed as a Weighted Max-SAT problem. The advantage of looking for the
simplest policy in the given feature space that solves the given examples,
possibly non-optimally, is that many domains have no general, compact policies
that are optimal. The approach yields general policies for a number of
benchmark domains.
Related papers
- Parallel Strategies for Best-First Generalized Planning [51.713634067802104]
Generalized planning (GP) is a research area of AI that studies the automated synthesis of algorithmic-like solutions capable of solving multiple classical planning instances.
One of the current advancements has been the introduction of Best-First Generalized Planning (BFGP), a GP algorithm based on a novel solution space that can be explored with search.
This paper evaluates the application of parallel search techniques to BFGP, another critical component in closing the performance gap.
arXiv Detail & Related papers (2024-07-31T09:50:22Z) - Learning Generalized Policies for Fully Observable Non-Deterministic Planning Domains [12.730070122798459]
General policies represent reactive strategies for solving large families of planning problems.
We extend the formulations and the resulting methods for learning general policies over fully observable, non-deterministic domains.
arXiv Detail & Related papers (2024-04-03T06:25:42Z) - Last-Iterate Convergent Policy Gradient Primal-Dual Methods for
Constrained MDPs [107.28031292946774]
We study the problem of computing an optimal policy of an infinite-horizon discounted Markov decision process (constrained MDP)
We develop two single-time-scale policy-based primal-dual algorithms with non-asymptotic convergence of their policy iterates to an optimal constrained policy.
To the best of our knowledge, this work appears to be the first non-asymptotic policy last-iterate convergence result for single-time-scale algorithms in constrained MDPs.
arXiv Detail & Related papers (2023-06-20T17:27:31Z) - Learning Logic Specifications for Soft Policy Guidance in POMCP [71.69251176275638]
Partially Observable Monte Carlo Planning (POMCP) is an efficient solver for Partially Observable Markov Decision Processes (POMDPs)
POMCP suffers from sparse reward function, namely, rewards achieved only when the final goal is reached.
In this paper, we use inductive logic programming to learn logic specifications from traces of POMCP executions.
arXiv Detail & Related papers (2023-03-16T09:37:10Z) - Policy learning "without" overlap: Pessimism and generalized empirical Bernstein's inequality [94.89246810243053]
This paper studies offline policy learning, which aims at utilizing observations collected a priori to learn an optimal individualized decision rule.
Existing policy learning methods rely on a uniform overlap assumption, i.e., the propensities of exploring all actions for all individual characteristics must be lower bounded.
We propose Pessimistic Policy Learning (PPL), a new algorithm that optimize lower confidence bounds (LCBs) instead of point estimates.
arXiv Detail & Related papers (2022-12-19T22:43:08Z) - Nearly Optimal Latent State Decoding in Block MDPs [74.51224067640717]
In episodic Block MDPs, the decision maker has access to rich observations or contexts generated from a small number of latent states.
We are first interested in estimating the latent state decoding function based on data generated under a fixed behavior policy.
We then study the problem of learning near-optimal policies in the reward-free framework.
arXiv Detail & Related papers (2022-08-17T18:49:53Z) - Learning Generalized Policies Without Supervision Using GNNs [20.322992960599255]
We consider the problem of learning generalized policies for classical planning domains using graph neural networks.
We use a simple and general GNN architecture and aim at obtaining crisp experimental results.
We exploit the relation established between the expressive power of GNNs and the $C_2$ fragment of first-order logic.
arXiv Detail & Related papers (2022-05-12T10:28:46Z) - Preliminary Results on Using Abstract AND-OR Graphs for Generalized
Solving of Stochastic Shortest Path Problems [25.152899734616298]
Shortest Path Problems (SSPs) are goal-oriented problems in the real-world.
A key difficulty for computing solutions for SSPs is finding solutions to even moderately sized problems intractable.
We show that our approach can be embedded in any SSP solver to compute hierarchically optimal policies.
arXiv Detail & Related papers (2022-04-08T21:30:47Z) - Expressing and Exploiting the Common Subgoal Structure of Classical
Planning Domains Using Sketches: Extended Version [17.63517562327928]
We use a simple but powerful language for expressing problem decompositions introduced recently by Bonet and Geffner, called policy sketches.
A policy sketch R consists of a set of Boolean and numerical features and a set of sketch rules that express how the values of these features are supposed to change.
We show that many planning domains that cannot be solved by SIW are provably solvable in low time with the SIW_R algorithm.
arXiv Detail & Related papers (2021-05-10T10:36:18Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z) - Strengthening Deterministic Policies for POMDPs [5.092711491848192]
We provide a novel MILP encoding that supports sophisticated specifications in the form of temporal logic constraints.
We employ a preprocessing of the POMDP to encompass memory-based decisions.
The advantages of our approach lie (1) in the flexibility to strengthen simple deterministic policies without losing computational tractability and (2) in the ability to enforce the provable satisfaction of arbitrarily many specifications.
arXiv Detail & Related papers (2020-07-16T14:22:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.