Point-based Value Iteration for Neuro-Symbolic POMDPs
- URL: http://arxiv.org/abs/2306.17639v1
- Date: Fri, 30 Jun 2023 13:26:08 GMT
- Title: Point-based Value Iteration for Neuro-Symbolic POMDPs
- Authors: Rui Yan, Gabriel Santos, Gethin Norman, David Parker, Marta
Kwiatkowska
- Abstract summary: We introduce neuro-symbolic partially observable Markov decision processes (NS-POMDPs)
NS-POMDPs model an agent that perceives a continuous-state environment using a neural network and makes decisions symbolically.
We present two value iteration algorithms that ensure finite representability by exploiting the structure of the continuous-state model.
- Score: 27.96140203850222
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neuro-symbolic artificial intelligence is an emerging area that combines
traditional symbolic techniques with neural networks. In this paper, we
consider its application to sequential decision making under uncertainty. We
introduce neuro-symbolic partially observable Markov decision processes
(NS-POMDPs), which model an agent that perceives a continuous-state environment
using a neural network and makes decisions symbolically, and study the problem
of optimising discounted cumulative rewards. This requires functions over
continuous-state beliefs, for which we propose a novel piecewise linear and
convex representation (P-PWLC) in terms of polyhedra covering the
continuous-state space and value vectors, and extend Bellman backups to this
representation. We prove the convexity and continuity of value functions and
present two value iteration algorithms that ensure finite representability by
exploiting the underlying structure of the continuous-state model and the
neural perception mechanism. The first is a classical (exact) value iteration
algorithm extending $\alpha$-functions of Porta et al (2006) to the P-PWLC
representation for continuous-state spaces. The second is a point-based
(approximate) method called NS-HSVI, which uses the P-PWLC representation and
belief-value induced functions to approximate value functions from below and
above for two types of beliefs, particle-based and region-based. Using a
prototype implementation, we show the practical applicability of our approach
on two case studies that employ (trained) ReLU neural networks as perception
functions, dynamic car parking and an aircraft collision avoidance system, by
synthesising (approximately) optimal strategies. An experimental comparison
with the finite-state POMDP solver SARSOP demonstrates that NS-HSVI is more
robust to particle disturbances.
Related papers
- A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimiax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Pointer Networks with Q-Learning for Combinatorial Optimization [55.2480439325792]
We introduce the Pointer Q-Network (PQN), a hybrid neural architecture that integrates model-free Q-value policy approximation with Pointer Networks (Ptr-Nets)
Our empirical results demonstrate the efficacy of this approach, also testing the model in unstable environments.
arXiv Detail & Related papers (2023-11-05T12:03:58Z) - Verification of Neural Network Control Systems using Symbolic Zonotopes
and Polynotopes [1.0312968200748116]
Verification and safety assessment of neural network controlled systems (NNCSs) is an emerging challenge.
To provide guarantees, verification tools must efficiently capture the interplay between the neural network and the physical system within the control loop.
A compositional approach focused on preserving long term symbolic dependency is proposed for the analysis of NNCSs.
arXiv Detail & Related papers (2023-06-26T11:52:14Z) - Primal and Dual Analysis of Entropic Fictitious Play for Finite-sum
Problems [42.375903320536715]
The entropic fictitious play (EFP) is a recently proposed algorithm that minimizes the sum of a convex functional and entropy in the space of measures.
We provide a concise primal-dual analysis of EFP in the setting where the learning problem exhibits a finite-sum structure.
arXiv Detail & Related papers (2023-03-06T08:05:08Z) - Contrastive Conditional Neural Processes [45.70735205041254]
Conditional Neural Processes(CNPs) bridge neural networks with probabilistic inference to approximate functions of Processes under meta-learning settings.
Two auxiliary contrastive branches are set up hierarchically, namely in-instantiation temporal contrastive learning(tt TCL) and cross-instantiation function contrastive learning(tt FCL)
We empirically show that tt TCL captures high-level abstraction of observations, whereas tt FCL helps identify underlying functions, which in turn provides more efficient representations.
arXiv Detail & Related papers (2022-03-08T10:08:45Z) - Strategy Synthesis for Zero-Sum Neuro-Symbolic Concurrent Stochastic Games [31.51588071503617]
We propose a novel modelling formalism called neuro-symbolic concurrent games (NS-CSGs)
We focus on the class of NS-CSGs with Borel state spaces and prove the existence and measurability of the value function for zero-sum discounted cumulative rewards.
We present, for the first time, practical value iteration (VI) and policy iteration (PI) algorithms to solve this new subclass of continuous-state CSGs.
arXiv Detail & Related papers (2022-02-13T08:39:00Z) - Wasserstein Flow Meets Replicator Dynamics: A Mean-Field Analysis of Representation Learning in Actor-Critic [137.04558017227583]
Actor-critic (AC) algorithms, empowered by neural networks, have had significant empirical success in recent years.
We take a mean-field perspective on the evolution and convergence of feature-based neural AC.
We prove that neural AC finds the globally optimal policy at a sublinear rate.
arXiv Detail & Related papers (2021-12-27T06:09:50Z) - Robust and Adaptive Temporal-Difference Learning Using An Ensemble of
Gaussian Processes [70.80716221080118]
The paper takes a generative perspective on policy evaluation via temporal-difference (TD) learning.
The OS-GPTD approach is developed to estimate the value function for a given policy by observing a sequence of state-reward pairs.
To alleviate the limited expressiveness associated with a single fixed kernel, a weighted ensemble (E) of GP priors is employed to yield an alternative scheme.
arXiv Detail & Related papers (2021-12-01T23:15:09Z) - Learning Sub-Patterns in Piecewise Continuous Functions [4.18804572788063]
Most gradient descent algorithms can optimize neural networks that are sub-differentiable in their parameters.
This paper focuses on the case where the discontinuities arise from distinct sub-patterns.
We propose a new discontinuous deep neural network model trainable via a decoupled two-step procedure.
arXiv Detail & Related papers (2020-10-29T13:44:13Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.