Learning General Optimal Policies with Graph Neural Networks: Expressive
Power, Transparency, and Limits
- URL: http://arxiv.org/abs/2109.10129v1
- Date: Tue, 21 Sep 2021 12:22:29 GMT
- Title: Learning General Optimal Policies with Graph Neural Networks: Expressive
Power, Transparency, and Limits
- Authors: Simon St{\aa}hlberg, Blai Bonet, Hector Geffner
- Abstract summary: We train a simple GNN in a supervised manner to approximate the optimal value function $V*(s)$ of a number of sample states $s$.
It is observed that general optimal policies are obtained in domains where general optimal value functions can be defined with $C$ features but not in those requiring more expressive $C_3$ features.
- Score: 18.718037284357834
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: It has been recently shown that general policies for many classical planning
domains can be expressed and learned in terms of a pool of features defined
from the domain predicates using a description logic grammar. At the same time,
most description logics correspond to a fragment of $k$-variable counting logic
($C_k$) for $k=2$, that has been shown to provide a tight characterization of
the expressive power of graph neural networks. In this work, we make use of
these results to understand the power and limits of using graph neural networks
(GNNs) for learning optimal general policies over a number of tractable
planning domains where such policies are known to exist. For this, we train a
simple GNN in a supervised manner to approximate the optimal value function
$V^{*}(s)$ of a number of sample states $s$. As predicted by the theory, it is
observed that general optimal policies are obtained in domains where general
optimal value functions can be defined with $C_2$ features but not in those
requiring more expressive $C_3$ features. In addition, it is observed that the
features learned are in close correspondence with the features needed to
express $V^{*}$ in closed form. The theory and the analysis of the domains let
us understand the features that are actually learned as well as those that
cannot be learned in this way, and let us move in a principled manner from a
combinatorial optimization approach to learning general policies to a
potentially, more robust and scalable approach based on deep learning.
Related papers
- How Graph Neural Networks Learn: Lessons from Training Dynamics [80.41778059014393]
We study the training dynamics in function space of graph neural networks (GNNs)
We find that the gradient descent optimization of GNNs implicitly leverages the graph structure to update the learned function.
This finding offers new interpretable insights into when and why the learned GNN functions generalize.
arXiv Detail & Related papers (2023-10-08T10:19:56Z) - Optimistic Natural Policy Gradient: a Simple Efficient Policy
Optimization Framework for Online RL [23.957148537567146]
This paper proposes a simple efficient policy optimization framework -- Optimistic NPG for online RL.
For $d$-dimensional linear MDPs, Optimistic NPG is computationally efficient, and learns an $varepsilon$-optimal policy within $tildeO(d2/varepsilon3)$ samples.
arXiv Detail & Related papers (2023-05-18T15:19:26Z) - From Relational Pooling to Subgraph GNNs: A Universal Framework for More
Expressive Graph Neural Networks [8.121462458089141]
We show how to assign labels to nodes to improve expressive power of message passing neural networks.
We experimentally prove that our method is universally compatible and capable of improving the expressivity of any base GNN model.
Our $k,l$-GNNs achieve superior performance on many synthetic and real-world datasets.
arXiv Detail & Related papers (2023-05-08T18:00:50Z) - Learning Generalized Policies Without Supervision Using GNNs [20.322992960599255]
We consider the problem of learning generalized policies for classical planning domains using graph neural networks.
We use a simple and general GNN architecture and aim at obtaining crisp experimental results.
We exploit the relation established between the expressive power of GNNs and the $C_2$ fragment of first-order logic.
arXiv Detail & Related papers (2022-05-12T10:28:46Z) - Dist2Cycle: A Simplicial Neural Network for Homology Localization [66.15805004725809]
Simplicial complexes can be viewed as high dimensional generalizations of graphs that explicitly encode multi-way ordered relations.
We propose a graph convolutional model for learning functions parametrized by the $k$-homological features of simplicial complexes.
arXiv Detail & Related papers (2021-10-28T14:59:41Z) - Nearly Optimal Regret for Learning Adversarial MDPs with Linear Function
Approximation [92.3161051419884]
We study the reinforcement learning for finite-horizon episodic Markov decision processes with adversarial reward and full information feedback.
We show that it can achieve $tildeO(dHsqrtT)$ regret, where $H$ is the length of the episode.
We also prove a matching lower bound of $tildeOmega(dHsqrtT)$ up to logarithmic factors.
arXiv Detail & Related papers (2021-02-17T18:54:08Z) - Improving Robustness and Generality of NLP Models Using Disentangled
Representations [62.08794500431367]
Supervised neural networks first map an input $x$ to a single representation $z$, and then map $z$ to the output label $y$.
We present methods to improve robustness and generality of NLP models from the standpoint of disentangled representation learning.
We show that models trained with the proposed criteria provide better robustness and domain adaptation ability in a wide range of supervised learning tasks.
arXiv Detail & Related papers (2020-09-21T02:48:46Z) - Reinforcement Learning with General Value Function Approximation:
Provably Efficient Approach via Bounded Eluder Dimension [124.7752517531109]
We establish a provably efficient reinforcement learning algorithm with general value function approximation.
We show that our algorithm achieves a regret bound of $widetildeO(mathrmpoly(dH)sqrtT)$ where $d$ is a complexity measure.
Our theory generalizes recent progress on RL with linear value function approximation and does not make explicit assumptions on the model of the environment.
arXiv Detail & Related papers (2020-05-21T17:36:09Z) - Stochastic Flows and Geometric Optimization on the Orthogonal Group [52.50121190744979]
We present a new class of geometrically-driven optimization algorithms on the orthogonal group $O(d)$.
We show that our methods can be applied in various fields of machine learning including deep, convolutional and recurrent neural networks, reinforcement learning, flows and metric learning.
arXiv Detail & Related papers (2020-03-30T15:37:50Z) - Complete Dictionary Learning via $\ell_p$-norm Maximization [10.82081111170268]
We investigate a family of $ell_p$-norm ($p>2,p in mathbbN$) approaches for the complete dictionary learning problem.
We show that the global maximizers of these formulations are very close to the true dictionary with high probability, even when Gaussian noise is present.
Experiments will demonstrate that the $ell_p$-based approaches enjoy a higher computational efficiency and better robustness than conventional approaches.
arXiv Detail & Related papers (2020-02-24T02:33:01Z) - Estimating Q(s,s') with Deep Deterministic Dynamics Gradients [25.200259376015744]
We introduce a novel form of value function, $Q(s, s')$, that expresses the utility of transitioning from a state $s$ to a neighboring state $s'$.
In order to derive an optimal policy, we develop a forward dynamics model that learns to make next-state predictions that maximize this value.
arXiv Detail & Related papers (2020-02-21T19:05:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.