Learning Generalized Policies Without Supervision Using GNNs
- URL: http://arxiv.org/abs/2205.06002v1
- Date: Thu, 12 May 2022 10:28:46 GMT
- Title: Learning Generalized Policies Without Supervision Using GNNs
- Authors: Simon St{\aa}hlberg, Blai Bonet, Hector Geffner
- Abstract summary: We consider the problem of learning generalized policies for classical planning domains using graph neural networks.
We use a simple and general GNN architecture and aim at obtaining crisp experimental results.
We exploit the relation established between the expressive power of GNNs and the $C_2$ fragment of first-order logic.
- Score: 20.322992960599255
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We consider the problem of learning generalized policies for classical
planning domains using graph neural networks from small instances represented
in lifted STRIPS. The problem has been considered before but the proposed
neural architectures are complex and the results are often mixed. In this work,
we use a simple and general GNN architecture and aim at obtaining crisp
experimental results and a deeper understanding: either the policy greedy in
the learned value function achieves close to 100% generalization over instances
larger than those used in training, or the failure must be understood, and
possibly fixed, logically. For this, we exploit the relation established
between the expressive power of GNNs and the $C_{2}$ fragment of first-order
logic (namely, FOL with 2 variables and counting quantifiers). We find for
example that domains with general policies that require more expressive
features can be solved with GNNs once the states are extended with suitable
"derived atoms" encoding role compositions and transitive closures that do not
fit into $C_{2}$. The work follows the GNN approach for learning optimal
general policies in a supervised fashion (Stahlberg, Bonet, Geffner, 2022); but
the learned policies are no longer required to be optimal (which expands the
scope, as many planning domains do not have general optimal policies) and are
learned without supervision. Interestingly, value-based reinforcement learning
methods that aim to produce optimal policies, do not always yield policies that
generalize, as the goals of optimality and generality are in conflict in
domains where optimal planning is NP-hard.
Related papers
- Federated Reinforcement Learning with Constraint Heterogeneity [22.79217297480751]
We study a Federated Reinforcement Learning (FedRL) problem with constraint heterogeneity.
We show that FedNPG achieves global convergence with an $tildeO (1/sqrtT)$ rate, and FedPPO efficiently solves complicated learning tasks with the use of deep neural networks.
arXiv Detail & Related papers (2024-05-06T07:44:50Z) - Optimistic Natural Policy Gradient: a Simple Efficient Policy
Optimization Framework for Online RL [23.957148537567146]
This paper proposes a simple efficient policy optimization framework -- Optimistic NPG for online RL.
For $d$-dimensional linear MDPs, Optimistic NPG is computationally efficient, and learns an $varepsilon$-optimal policy within $tildeO(d2/varepsilon3)$ samples.
arXiv Detail & Related papers (2023-05-18T15:19:26Z) - Policy learning "without" overlap: Pessimism and generalized empirical Bernstein's inequality [94.89246810243053]
This paper studies offline policy learning, which aims at utilizing observations collected a priori to learn an optimal individualized decision rule.
Existing policy learning methods rely on a uniform overlap assumption, i.e., the propensities of exploring all actions for all individual characteristics must be lower bounded.
We propose Pessimistic Policy Learning (PPL), a new algorithm that optimize lower confidence bounds (LCBs) instead of point estimates.
arXiv Detail & Related papers (2022-12-19T22:43:08Z) - Graph Neural Networks are Inherently Good Generalizers: Insights by
Bridging GNNs and MLPs [71.93227401463199]
This paper pinpoints the major source of GNNs' performance gain to their intrinsic capability, by introducing an intermediate model class dubbed as P(ropagational)MLP.
We observe that PMLPs consistently perform on par with (or even exceed) their GNN counterparts, while being much more efficient in training.
arXiv Detail & Related papers (2022-12-18T08:17:32Z) - Offline Reinforcement Learning with Closed-Form Policy Improvement
Operators [88.54210578912554]
Behavior constrained policy optimization has been demonstrated to be a successful paradigm for tackling Offline Reinforcement Learning.
In this paper, we propose our closed-form policy improvement operators.
We empirically demonstrate their effectiveness over state-of-the-art algorithms on the standard D4RL benchmark.
arXiv Detail & Related papers (2022-11-29T06:29:26Z) - PG3: Policy-Guided Planning for Generalized Policy Generation [25.418642034856365]
We study generalized policy search-based methods with a focus on the score function used to guide the search over policies.
The main idea behind our approach is that a candidate policy should be used to guide planning on training problems as a mechanism for evaluating that candidate.
Empirical results in six domains confirm that PG3 learns generalized policies more efficiently and effectively than several baselines.
arXiv Detail & Related papers (2022-04-21T21:59:25Z) - Understanding Robust Generalization in Learning Regular Languages [85.95124524975202]
We study robust generalization in the context of using recurrent neural networks to learn regular languages.
We propose a compositional strategy to address this.
We theoretically prove that the compositional strategy generalizes significantly better than the end-to-end strategy.
arXiv Detail & Related papers (2022-02-20T02:50:09Z) - Learning General Optimal Policies with Graph Neural Networks: Expressive
Power, Transparency, and Limits [18.718037284357834]
We train a simple GNN in a supervised manner to approximate the optimal value function $V*(s)$ of a number of sample states $s$.
It is observed that general optimal policies are obtained in domains where general optimal value functions can be defined with $C$ features but not in those requiring more expressive $C_3$ features.
arXiv Detail & Related papers (2021-09-21T12:22:29Z) - Policy Mirror Descent for Regularized Reinforcement Learning: A
Generalized Framework with Linear Convergence [60.20076757208645]
This paper proposes a general policy mirror descent (GPMD) algorithm for solving regularized RL.
We demonstrate that our algorithm converges linearly over an entire range learning rates, in a dimension-free fashion, to the global solution.
arXiv Detail & Related papers (2021-05-24T02:21:34Z) - Learning General Policies from Small Examples Without Supervision [18.718037284357834]
Generalized planning is concerned with the computation of general policies that solve multiple instances of a planning domain all at once.
It has been recently shown that these policies can be computed in two steps: first, a suitable abstraction in the form of a qualitative numerical planning problem (QNP) is learned from sample plans.
In this work, we introduce an alternative approach for computing more expressive general policies which does not require sample plans or a QNP planner.
arXiv Detail & Related papers (2021-01-03T19:44:13Z) - When Will Generative Adversarial Imitation Learning Algorithms Attain
Global Convergence [56.40794592158596]
We study generative adversarial imitation learning (GAIL) under general MDP and for nonlinear reward function classes.
This is the first systematic theoretical study of GAIL for global convergence.
arXiv Detail & Related papers (2020-06-24T06:24:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.