Strategy Synthesis for Zero-Sum Neuro-Symbolic Concurrent Stochastic Games
- URL: http://arxiv.org/abs/2202.06255v7
- Date: Thu, 11 Jul 2024 15:40:13 GMT
- Title: Strategy Synthesis for Zero-Sum Neuro-Symbolic Concurrent Stochastic Games
- Authors: Rui Yan, Gabriel Santos, Gethin Norman, David Parker, Marta Kwiatkowska,
- Abstract summary: We propose a novel modelling formalism called neuro-symbolic concurrent games (NS-CSGs)
We focus on the class of NS-CSGs with Borel state spaces and prove the existence and measurability of the value function for zero-sum discounted cumulative rewards.
We present, for the first time, practical value iteration (VI) and policy iteration (PI) algorithms to solve this new subclass of continuous-state CSGs.
- Score: 31.51588071503617
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neuro-symbolic approaches to artificial intelligence, which combine neural networks with classical symbolic techniques, are growing in prominence, necessitating formal approaches to reason about their correctness. We propose a novel modelling formalism called neuro-symbolic concurrent stochastic games (NS-CSGs), which comprise two probabilistic finite-state agents interacting in a shared continuous-state environment. Each agent observes the environment using a neural perception mechanism, which converts inputs such as images into symbolic percepts, and makes decisions symbolically. We focus on the class of NS-CSGs with Borel state spaces and prove the existence and measurability of the value function for zero-sum discounted cumulative rewards under piecewise-constant restrictions on the components of this class of models. To compute values and synthesise strategies, we present, for the first time, practical value iteration (VI) and policy iteration (PI) algorithms to solve this new subclass of continuous-state CSGs. These require a finite decomposition of the environment induced by the neural perception mechanisms of the agents and rely on finite abstract representations of value functions and strategies closed under VI or PI. First, we introduce a Borel measurable piecewise-constant (B-PWC) representation of value functions, extend minimax backups to this representation and propose a value iteration algorithm called B-PWC VI. Second, we introduce two novel representations for the value functions and strategies, constant-piecewise-linear (CON-PWL) and constant-piecewise-constant (CON-PWC) respectively, and propose Minimax-action-free PI by extending a recent PI method based on alternating player choices for finite state spaces to Borel state spaces, which does not require normal-form games to be solved.
Related papers
- BlendRL: A Framework for Merging Symbolic and Neural Policy Learning [23.854830898003726]
BlendRL is a neuro-symbolic RL framework that integrates both paradigms within RL agents that use mixtures of both logic and neural policies.
We empirically demonstrate that BlendRL agents outperform both neural and symbolic baselines in standard Atari environments.
We analyze the interaction between neural and symbolic policies, illustrating how their hybrid use helps agents overcome each other's limitations.
arXiv Detail & Related papers (2024-10-15T15:24:20Z) - How to discretize continuous state-action spaces in Q-learning: A symbolic control approach [0.0]
The paper presents a systematic analysis that highlights a major drawback in space discretization methods.
To address this challenge, the paper proposes a symbolic model that represents behavioral relations.
This relation allows for seamless application of the synthesized controller based on abstraction to the original system.
arXiv Detail & Related papers (2024-06-03T17:30:42Z) - A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks.
We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks.
Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z) - Towards Continual Learning Desiderata via HSIC-Bottleneck
Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion.
Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z) - Probabilistic Reach-Avoid for Bayesian Neural Networks [71.67052234622781]
We show that an optimal synthesis algorithm can provide more than a four-fold increase in the number of certifiable states.
The algorithm is able to provide more than a three-fold increase in the average guaranteed reach-avoid probability.
arXiv Detail & Related papers (2023-10-03T10:52:21Z) - Point-Based Value Iteration for POMDPs with Neural Perception Mechanisms [31.51588071503617]
We introduce neuro-symbolic partially observable Markov decision processes (NS-POMDPs)
We propose a novel piecewise linear and convex representation (P-PWLC) in terms of polyhedra covering the state space and value vectors.
We show the practical applicability of our approach on two case studies that employ (trained) ReLU neural networks as perception functions.
arXiv Detail & Related papers (2023-06-30T13:26:08Z) - Verification of Neural Network Control Systems using Symbolic Zonotopes
and Polynotopes [1.0312968200748116]
Verification and safety assessment of neural network controlled systems (NNCSs) is an emerging challenge.
To provide guarantees, verification tools must efficiently capture the interplay between the neural network and the physical system within the control loop.
A compositional approach focused on preserving long term symbolic dependency is proposed for the analysis of NNCSs.
arXiv Detail & Related papers (2023-06-26T11:52:14Z) - Symbolic Distillation for Learned TCP Congestion Control [70.27367981153299]
TCP congestion control has achieved tremendous success with deep reinforcement learning (RL) approaches.
Black-box policies lack interpretability and reliability, and often, they need to operate outside the traditional TCP datapath.
This paper proposes a novel two-stage solution to achieve the best of both worlds: first, to train a deep RL agent, then distill its NN policy into white-box, light-weight rules.
arXiv Detail & Related papers (2022-10-24T00:58:16Z) - Exploration Policies for On-the-Fly Controller Synthesis: A
Reinforcement Learning Approach [0.0]
We propose a new method for obtaining unboundeds based on Reinforcement Learning (RL)
Our agents learn from scratch in a highly observable partially RL task and outperform existing overall, in instances unseen during training.
arXiv Detail & Related papers (2022-10-07T20:28:25Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.