Related papers: Learning Generalized Policies Without Supervision Using GNNs

Learning Generalized Policies Without Supervision Using GNNs

URL: http://arxiv.org/abs/2205.06002v1
Date: Thu, 12 May 2022 10:28:46 GMT
Title: Learning Generalized Policies Without Supervision Using GNNs
Authors: Simon St{\aa}hlberg, Blai Bonet, Hector Geffner
Abstract summary: We consider the problem of learning generalized policies for classical planning domains using graph neural networks. We use a simple and general GNN architecture and aim at obtaining crisp experimental results. We exploit the relation established between the expressive power of GNNs and the $C_2$ fragment of first-order logic.
Score: 20.322992960599255
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We consider the problem of learning generalized policies for classical planning domains using graph neural networks from small instances represented in lifted STRIPS. The problem has been considered before but the proposed neural architectures are complex and the results are often mixed. In this work, we use a simple and general GNN architecture and aim at obtaining crisp experimental results and a deeper understanding: either the policy greedy in the learned value function achieves close to 100% generalization over instances larger than those used in training, or the failure must be understood, and possibly fixed, logically. For this, we exploit the relation established between the expressive power of GNNs and the $C_{2}$ fragment of first-order logic (namely, FOL with 2 variables and counting quantifiers). We find for example that domains with general policies that require more expressive features can be solved with GNNs once the states are extended with suitable "derived atoms" encoding role compositions and transitive closures that do not fit into $C_{2}$. The work follows the GNN approach for learning optimal general policies in a supervised fashion (Stahlberg, Bonet, Geffner, 2022); but the learned policies are no longer required to be optimal (which expands the scope, as many planning domains do not have general optimal policies) and are learned without supervision. Interestingly, value-based reinforcement learning methods that aim to produce optimal policies, do not always yield policies that generalize, as the goals of optimality and generality are in conflict in domains where optimal planning is NP-hard.

Related papers

Convergence and Sample Complexity of First-Order Methods for Agnostic Reinforcement Learning [66.4260157478436]
We study reinforcement learning in the policy learning setting.<n>The goal is to find a policy whose performance is competitive with the best policy in a given class of interest.
arXiv Detail & Related papers (2025-07-06T14:40:05Z)
Understanding Inverse Reinforcement Learning under Overparameterization: Non-Asymptotic Analysis and Global Optimality [52.906438147288256]
We show that our algorithm can identify the globally optimal reward and policy under certain neural network structures. This is the first IRL algorithm with a non-asymptotic convergence guarantee that provably achieves global optimality.
arXiv Detail & Related papers (2025-03-22T21:16:08Z)
Federated Reinforcement Learning with Constraint Heterogeneity [22.79217297480751]
We study a Federated Reinforcement Learning (FedRL) problem with constraint heterogeneity. We show that FedNPG achieves global convergence with an $tildeO (1/sqrtT)$ rate, and FedPPO efficiently solves complicated learning tasks with the use of deep neural networks.
arXiv Detail & Related papers (2024-05-06T07:44:50Z)
Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RL [23.957148537567146]
This paper proposes a simple efficient policy optimization framework -- Optimistic NPG for online RL. For $d$-dimensional linear MDPs, Optimistic NPG is computationally efficient, and learns an $varepsilon$-optimal policy within $tildeO(d2/varepsilon3)$ samples.
arXiv Detail & Related papers (2023-05-18T15:19:26Z)
Policy learning "without" overlap: Pessimism and generalized empirical Bernstein's inequality [94.89246810243053]
This paper studies offline policy learning, which aims at utilizing observations collected a priori to learn an optimal individualized decision rule. Existing policy learning methods rely on a uniform overlap assumption, i.e., the propensities of exploring all actions for all individual characteristics must be lower bounded. We propose Pessimistic Policy Learning (PPL), a new algorithm that optimize lower confidence bounds (LCBs) instead of point estimates.
arXiv Detail & Related papers (2022-12-19T22:43:08Z)
Graph Neural Networks are Inherently Good Generalizers: Insights by Bridging GNNs and MLPs [71.93227401463199]
This paper pinpoints the major source of GNNs' performance gain to their intrinsic capability, by introducing an intermediate model class dubbed as P(ropagational)MLP. We observe that PMLPs consistently perform on par with (or even exceed) their GNN counterparts, while being much more efficient in training.
arXiv Detail & Related papers (2022-12-18T08:17:32Z)
Offline Reinforcement Learning with Closed-Form Policy Improvement Operators [88.54210578912554]
Behavior constrained policy optimization has been demonstrated to be a successful paradigm for tackling Offline Reinforcement Learning. In this paper, we propose our closed-form policy improvement operators. We empirically demonstrate their effectiveness over state-of-the-art algorithms on the standard D4RL benchmark.
arXiv Detail & Related papers (2022-11-29T06:29:26Z)
PG3: Policy-Guided Planning for Generalized Policy Generation [25.418642034856365]
We study generalized policy search-based methods with a focus on the score function used to guide the search over policies. The main idea behind our approach is that a candidate policy should be used to guide planning on training problems as a mechanism for evaluating that candidate. Empirical results in six domains confirm that PG3 learns generalized policies more efficiently and effectively than several baselines.
arXiv Detail & Related papers (2022-04-21T21:59:25Z)
Understanding Robust Generalization in Learning Regular Languages [85.95124524975202]
We study robust generalization in the context of using recurrent neural networks to learn regular languages. We propose a compositional strategy to address this. We theoretically prove that the compositional strategy generalizes significantly better than the end-to-end strategy.
arXiv Detail & Related papers (2022-02-20T02:50:09Z)
Learning General Optimal Policies with Graph Neural Networks: Expressive Power, Transparency, and Limits [18.718037284357834]
We train a simple GNN in a supervised manner to approximate the optimal value function $V*(s)$ of a number of sample states $s$. It is observed that general optimal policies are obtained in domains where general optimal value functions can be defined with $C$ features but not in those requiring more expressive $C_3$ features.
arXiv Detail & Related papers (2021-09-21T12:22:29Z)
Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence [60.20076757208645]
This paper proposes a general policy mirror descent (GPMD) algorithm for solving regularized RL. We demonstrate that our algorithm converges linearly over an entire range learning rates, in a dimension-free fashion, to the global solution.
arXiv Detail & Related papers (2021-05-24T02:21:34Z)
Learning General Policies from Small Examples Without Supervision [18.718037284357834]
Generalized planning is concerned with the computation of general policies that solve multiple instances of a planning domain all at once. It has been recently shown that these policies can be computed in two steps: first, a suitable abstraction in the form of a qualitative numerical planning problem (QNP) is learned from sample plans. In this work, we introduce an alternative approach for computing more expressive general policies which does not require sample plans or a QNP planner.
arXiv Detail & Related papers (2021-01-03T19:44:13Z)
When Will Generative Adversarial Imitation Learning Algorithms Attain Global Convergence [56.40794592158596]
We study generative adversarial imitation learning (GAIL) under general MDP and for nonlinear reward function classes. This is the first systematic theoretical study of GAIL for global convergence.
arXiv Detail & Related papers (2020-06-24T06:24:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.