Related papers: Learning General Optimal Policies with Graph Neural Networks: Expressive Power, Transparency, and Limits

Learning General Optimal Policies with Graph Neural Networks: Expressive Power, Transparency, and Limits

URL: http://arxiv.org/abs/2109.10129v1
Date: Tue, 21 Sep 2021 12:22:29 GMT
Title: Learning General Optimal Policies with Graph Neural Networks: Expressive Power, Transparency, and Limits
Authors: Simon St{\aa}hlberg, Blai Bonet, Hector Geffner
Abstract summary: We train a simple GNN in a supervised manner to approximate the optimal value function $V*(s)$ of a number of sample states $s$. It is observed that general optimal policies are obtained in domains where general optimal value functions can be defined with $C$ features but not in those requiring more expressive $C_3$ features.
Score: 18.718037284357834
License: http://creativecommons.org/licenses/by/4.0/
Abstract: It has been recently shown that general policies for many classical planning domains can be expressed and learned in terms of a pool of features defined from the domain predicates using a description logic grammar. At the same time, most description logics correspond to a fragment of $k$-variable counting logic ($C_k$) for $k=2$, that has been shown to provide a tight characterization of the expressive power of graph neural networks. In this work, we make use of these results to understand the power and limits of using graph neural networks (GNNs) for learning optimal general policies over a number of tractable planning domains where such policies are known to exist. For this, we train a simple GNN in a supervised manner to approximate the optimal value function $V^{*}(s)$ of a number of sample states $s$. As predicted by the theory, it is observed that general optimal policies are obtained in domains where general optimal value functions can be defined with $C_2$ features but not in those requiring more expressive $C_3$ features. In addition, it is observed that the features learned are in close correspondence with the features needed to express $V^{*}$ in closed form. The theory and the analysis of the domains let us understand the features that are actually learned as well as those that cannot be learned in this way, and let us move in a principled manner from a combinatorial optimization approach to learning general policies to a potentially, more robust and scalable approach based on deep learning.

Related papers

Relational GNNs Cannot Learn $C_2$ Features for Planning [0.0]
Graph Neural Networks (R-GNNs) are an approach for learning value functions that can generalise to unseen problems from a given planning domain.
arXiv Detail & Related papers (2025-06-13T12:35:56Z)
Rethinking the Global Convergence of Softmax Policy Gradient with Linear Function Approximation [52.772454746132276]
We show that the approximation error in modeling problem-dependent quantities is irrelevant to the algorithm's global convergence.<n>We prove that $textttLin-SPG$ with any arbitrary constant learning rate can ensure global convergence to the optimal policy.
arXiv Detail & Related papers (2025-05-06T04:03:06Z)
How Graph Neural Networks Learn: Lessons from Training Dynamics [80.41778059014393]
We study the training dynamics in function space of graph neural networks (GNNs) We find that the gradient descent optimization of GNNs implicitly leverages the graph structure to update the learned function. This finding offers new interpretable insights into when and why the learned GNN functions generalize.
arXiv Detail & Related papers (2023-10-08T10:19:56Z)
Quantifying the Optimization and Generalization Advantages of Graph Neural Networks Over Multilayer Perceptrons [50.33260238739837]
Graph networks (GNNs) have demonstrated remarkable capabilities in learning from graph-structured data.<n>There remains a lack of analysis comparing GNNs and generalizations from an optimization and generalization perspective.
arXiv Detail & Related papers (2023-06-24T10:21:11Z)
Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RL [23.957148537567146]
This paper proposes a simple efficient policy optimization framework -- Optimistic NPG for online RL. For $d$-dimensional linear MDPs, Optimistic NPG is computationally efficient, and learns an $varepsilon$-optimal policy within $tildeO(d2/varepsilon3)$ samples.
arXiv Detail & Related papers (2023-05-18T15:19:26Z)
From Relational Pooling to Subgraph GNNs: A Universal Framework for More Expressive Graph Neural Networks [8.121462458089141]
We show how to assign labels to nodes to improve expressive power of message passing neural networks. We experimentally prove that our method is universally compatible and capable of improving the expressivity of any base GNN model. Our $k,l$-GNNs achieve superior performance on many synthetic and real-world datasets.
arXiv Detail & Related papers (2023-05-08T18:00:50Z)
Learning Generalized Policies Without Supervision Using GNNs [20.322992960599255]
We consider the problem of learning generalized policies for classical planning domains using graph neural networks. We use a simple and general GNN architecture and aim at obtaining crisp experimental results. We exploit the relation established between the expressive power of GNNs and the $C_2$ fragment of first-order logic.
arXiv Detail & Related papers (2022-05-12T10:28:46Z)
Dist2Cycle: A Simplicial Neural Network for Homology Localization [66.15805004725809]
Simplicial complexes can be viewed as high dimensional generalizations of graphs that explicitly encode multi-way ordered relations. We propose a graph convolutional model for learning functions parametrized by the $k$-homological features of simplicial complexes.
arXiv Detail & Related papers (2021-10-28T14:59:41Z)
Nearly Optimal Regret for Learning Adversarial MDPs with Linear Function Approximation [92.3161051419884]
We study the reinforcement learning for finite-horizon episodic Markov decision processes with adversarial reward and full information feedback. We show that it can achieve $tildeO(dHsqrtT)$ regret, where $H$ is the length of the episode. We also prove a matching lower bound of $tildeOmega(dHsqrtT)$ up to logarithmic factors.
arXiv Detail & Related papers (2021-02-17T18:54:08Z)
Improving Robustness and Generality of NLP Models Using Disentangled Representations [62.08794500431367]
Supervised neural networks first map an input $x$ to a single representation $z$, and then map $z$ to the output label $y$. We present methods to improve robustness and generality of NLP models from the standpoint of disentangled representation learning. We show that models trained with the proposed criteria provide better robustness and domain adaptation ability in a wide range of supervised learning tasks.
arXiv Detail & Related papers (2020-09-21T02:48:46Z)
Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension [124.7752517531109]
We establish a provably efficient reinforcement learning algorithm with general value function approximation. We show that our algorithm achieves a regret bound of $widetildeO(mathrmpoly(dH)sqrtT)$ where $d$ is a complexity measure. Our theory generalizes recent progress on RL with linear value function approximation and does not make explicit assumptions on the model of the environment.
arXiv Detail & Related papers (2020-05-21T17:36:09Z)
Stochastic Flows and Geometric Optimization on the Orthogonal Group [52.50121190744979]
We present a new class of geometrically-driven optimization algorithms on the orthogonal group $O(d)$. We show that our methods can be applied in various fields of machine learning including deep, convolutional and recurrent neural networks, reinforcement learning, flows and metric learning.
arXiv Detail & Related papers (2020-03-30T15:37:50Z)
Complete Dictionary Learning via $\ell_p$-norm Maximization [10.82081111170268]
We investigate a family of $ell_p$-norm ($p>2,p in mathbbN$) approaches for the complete dictionary learning problem. We show that the global maximizers of these formulations are very close to the true dictionary with high probability, even when Gaussian noise is present. Experiments will demonstrate that the $ell_p$-based approaches enjoy a higher computational efficiency and better robustness than conventional approaches.
arXiv Detail & Related papers (2020-02-24T02:33:01Z)
Estimating Q(s,s') with Deep Deterministic Dynamics Gradients [25.200259376015744]
We introduce a novel form of value function, $Q(s, s')$, that expresses the utility of transitioning from a state $s$ to a neighboring state $s'$. In order to derive an optimal policy, we develop a forward dynamics model that learns to make next-state predictions that maximize this value.
arXiv Detail & Related papers (2020-02-21T19:05:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.