Point-based Value Iteration for Neuro-Symbolic POMDPs
- URL: http://arxiv.org/abs/2306.17639v1
- Date: Fri, 30 Jun 2023 13:26:08 GMT
- Title: Point-based Value Iteration for Neuro-Symbolic POMDPs
- Authors: Rui Yan, Gabriel Santos, Gethin Norman, David Parker, Marta
Kwiatkowska
- Abstract summary: We introduce neuro-symbolic partially observable Markov decision processes (NS-POMDPs)
NS-POMDPs model an agent that perceives a continuous-state environment using a neural network and makes decisions symbolically.
We present two value iteration algorithms that ensure finite representability by exploiting the structure of the continuous-state model.
- Score: 27.96140203850222
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neuro-symbolic artificial intelligence is an emerging area that combines
traditional symbolic techniques with neural networks. In this paper, we
consider its application to sequential decision making under uncertainty. We
introduce neuro-symbolic partially observable Markov decision processes
(NS-POMDPs), which model an agent that perceives a continuous-state environment
using a neural network and makes decisions symbolically, and study the problem
of optimising discounted cumulative rewards. This requires functions over
continuous-state beliefs, for which we propose a novel piecewise linear and
convex representation (P-PWLC) in terms of polyhedra covering the
continuous-state space and value vectors, and extend Bellman backups to this
representation. We prove the convexity and continuity of value functions and
present two value iteration algorithms that ensure finite representability by
exploiting the underlying structure of the continuous-state model and the
neural perception mechanism. The first is a classical (exact) value iteration
algorithm extending $\alpha$-functions of Porta et al (2006) to the P-PWLC
representation for continuous-state spaces. The second is a point-based
(approximate) method called NS-HSVI, which uses the P-PWLC representation and
belief-value induced functions to approximate value functions from below and
above for two types of beliefs, particle-based and region-based. Using a
prototype implementation, we show the practical applicability of our approach
on two case studies that employ (trained) ReLU neural networks as perception
functions, dynamic car parking and an aircraft collision avoidance system, by
synthesising (approximately) optimal strategies. An experimental comparison
with the finite-state POMDP solver SARSOP demonstrates that NS-HSVI is more
robust to particle disturbances.
Related papers
- Function Forms of Simple ReLU Networks with Random Hidden Weights [1.2289361708127877]
We investigate the function space dynamics of a two-layer ReLU neural network in the infinite-width limit.<n>We highlight the Fisher information matrix's role in steering learning.<n>This work offers a robust foundation for understanding wide neural networks.
arXiv Detail & Related papers (2025-05-23T13:53:02Z) - Verification of Neural Network Control Systems using Symbolic Zonotopes
and Polynotopes [1.0312968200748116]
Verification and safety assessment of neural network controlled systems (NNCSs) is an emerging challenge.
To provide guarantees, verification tools must efficiently capture the interplay between the neural network and the physical system within the control loop.
A compositional approach focused on preserving long term symbolic dependency is proposed for the analysis of NNCSs.
arXiv Detail & Related papers (2023-06-26T11:52:14Z) - Primal and Dual Analysis of Entropic Fictitious Play for Finite-sum
Problems [42.375903320536715]
The entropic fictitious play (EFP) is a recently proposed algorithm that minimizes the sum of a convex functional and entropy in the space of measures.
We provide a concise primal-dual analysis of EFP in the setting where the learning problem exhibits a finite-sum structure.
arXiv Detail & Related papers (2023-03-06T08:05:08Z) - GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP,
and Beyond [101.5329678997916]
We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making.
We propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation.
We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR.
arXiv Detail & Related papers (2022-11-03T16:42:40Z) - Sequence Learning Using Equilibrium Propagation [2.3361887733755897]
Equilibrium Propagation (EP) is a powerful and more bio-plausible alternative to conventional learning frameworks such as backpropagation.
We leverage recent developments in modern hopfield networks to further understand energy based models and develop solutions for complex sequence classification tasks using EP.
arXiv Detail & Related papers (2022-09-14T20:01:22Z) - PAC Reinforcement Learning for Predictive State Representations [60.00237613646686]
We study online Reinforcement Learning (RL) in partially observable dynamical systems.
We focus on the Predictive State Representations (PSRs) model, which is an expressive model that captures other well-known models.
We develop a novel model-based algorithm for PSRs that can learn a near optimal policy in sample complexity scalingly.
arXiv Detail & Related papers (2022-07-12T17:57:17Z) - Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency [111.83670279016599]
We study reinforcement learning for partially observed decision processes (POMDPs) with infinite observation and state spaces.
We make the first attempt at partial observability and function approximation for a class of POMDPs with a linear structure.
arXiv Detail & Related papers (2022-04-20T21:15:38Z) - Contrastive Conditional Neural Processes [45.70735205041254]
Conditional Neural Processes(CNPs) bridge neural networks with probabilistic inference to approximate functions of Processes under meta-learning settings.
Two auxiliary contrastive branches are set up hierarchically, namely in-instantiation temporal contrastive learning(tt TCL) and cross-instantiation function contrastive learning(tt FCL)
We empirically show that tt TCL captures high-level abstraction of observations, whereas tt FCL helps identify underlying functions, which in turn provides more efficient representations.
arXiv Detail & Related papers (2022-03-08T10:08:45Z) - Strategy Synthesis for Zero-Sum Neuro-Symbolic Concurrent Stochastic Games [31.51588071503617]
We propose a novel modelling formalism called neuro-symbolic concurrent games (NS-CSGs)
We focus on the class of NS-CSGs with Borel state spaces and prove the existence and measurability of the value function for zero-sum discounted cumulative rewards.
We present, for the first time, practical value iteration (VI) and policy iteration (PI) algorithms to solve this new subclass of continuous-state CSGs.
arXiv Detail & Related papers (2022-02-13T08:39:00Z) - Robust and Adaptive Temporal-Difference Learning Using An Ensemble of
Gaussian Processes [70.80716221080118]
The paper takes a generative perspective on policy evaluation via temporal-difference (TD) learning.
The OS-GPTD approach is developed to estimate the value function for a given policy by observing a sequence of state-reward pairs.
To alleviate the limited expressiveness associated with a single fixed kernel, a weighted ensemble (E) of GP priors is employed to yield an alternative scheme.
arXiv Detail & Related papers (2021-12-01T23:15:09Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z) - MetaSDF: Meta-learning Signed Distance Functions [85.81290552559817]
Generalizing across shapes with neural implicit representations amounts to learning priors over the respective function space.
We formalize learning of a shape space as a meta-learning problem and leverage gradient-based meta-learning algorithms to solve this task.
arXiv Detail & Related papers (2020-06-17T05:14:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.