Stopping Criteria for Value Iteration on Concurrent Stochastic Reachability and Safety Games
- URL: http://arxiv.org/abs/2505.21087v1
- Date: Tue, 27 May 2025 12:13:47 GMT
- Title: Stopping Criteria for Value Iteration on Concurrent Stochastic Reachability and Safety Games
- Authors: Marta Grobelna, Jan Křetínský, Maximilian Weininger,
- Abstract summary: We consider zero-sum concurrent games (CSGs) played on graphs with reachability and safety objectives.<n>In practice, value (VI) outperforms the other approaches and is the most implemented method.<n>We provide bounded (a.k.a. interval) VI for CSGs: it complements standard VI with a converging sequence of over-approximations.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: We consider two-player zero-sum concurrent stochastic games (CSGs) played on graphs with reachability and safety objectives. These include degenerate classes such as Markov decision processes or turn-based stochastic games, which can be solved by linear or quadratic programming; however, in practice, value iteration (VI) outperforms the other approaches and is the most implemented method. Similarly, for CSGs, this practical performance makes VI an attractive alternative to the standard theoretical solution via the existential theory of reals. VI starts with an under-approximation of the sought values for each state and iteratively updates them, traditionally terminating once two consecutive approximations are $\epsilon$-close. However, this stopping criterion lacks guarantees on the precision of the approximation, which is the goal of this work. We provide bounded (a.k.a. interval) VI for CSGs: it complements standard VI with a converging sequence of over-approximations and terminates once the over- and under-approximations are $\epsilon$-close.
Related papers
- Towards Continual Learning Desiderata via HSIC-Bottleneck
Orthogonalization and Equiangular Embedding [55.107555305760954]
We propose a conceptually simple yet effective method that attributes forgetting to layer-wise parameter overwriting and the resulting decision boundary distortion.
Our method achieves competitive accuracy performance, even with absolute superiority of zero exemplar buffer and 1.02x the base model.
arXiv Detail & Related papers (2024-01-17T09:01:29Z) - Revisiting the Last-Iterate Convergence of Stochastic Gradient Methods [25.831462008050387]
The Gradient Descent (SGD) algorithm has triggered people's interest due to its good performance in practice but lack of theoretical understanding.
It still remains unclear whether the lastiterate convergence can be provably extended to wider composite optimization and non-Euclidean norms.
arXiv Detail & Related papers (2023-12-13T21:41:06Z) - Stopping Criteria for Value Iteration on Stochastic Games with
Quantitative Objectives [0.0]
A classic solution technique for Markov decision processes (MDP) and games (SG) is value (VI)
In this paper, we provide the first stopping criteria for VI on SG with total reward and mean payoff, yielding the first anytime algorithms in these settings.
arXiv Detail & Related papers (2023-04-19T19:09:55Z) - HSVI can solve zero-sum Partially Observable Stochastic Games [7.293053431456775]
State-of-the-art methods for solving 2-player zero-sum imperfect games rely on linear programming or dynamic regret minimization.
We propose a novel family of promising approaches complementing those relying on linear programming or iterative methods.
arXiv Detail & Related papers (2022-10-26T11:41:57Z) - Solving Constrained Variational Inequalities via an Interior Point
Method [88.39091990656107]
We develop an interior-point approach to solve constrained variational inequality (cVI) problems.
We provide convergence guarantees for ACVI in two general classes of problems.
Unlike previous work in this setting, ACVI provides a means to solve cVIs when the constraints are nontrivial.
arXiv Detail & Related papers (2022-06-21T17:55:13Z) - Robust and Adaptive Temporal-Difference Learning Using An Ensemble of
Gaussian Processes [70.80716221080118]
The paper takes a generative perspective on policy evaluation via temporal-difference (TD) learning.
The OS-GPTD approach is developed to estimate the value function for a given policy by observing a sequence of state-reward pairs.
To alleviate the limited expressiveness associated with a single fixed kernel, a weighted ensemble (E) of GP priors is employed to yield an alternative scheme.
arXiv Detail & Related papers (2021-12-01T23:15:09Z) - HSVI fo zs-POSGs using Concavity, Convexity and Lipschitz Properties [8.80899367147235]
In some cases we evaluate POMDs and Dec-MDs as approaches to the optimal value function.
This approach has succeeded in some partially observable games (s-POSGs) as well, but failed in general case despite known concavity properties.
We derive on these properties to derive prototypical convexity guarantees bounding approximators and efficient update and selection operators.
arXiv Detail & Related papers (2021-10-25T13:38:21Z) - Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth
Games: Convergence Analysis under Expected Co-coercivity [49.66890309455787]
We introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO.
We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size.
Our convergence guarantees hold under the arbitrary sampling paradigm, and we give insights into the complexity of minibatching.
arXiv Detail & Related papers (2021-06-30T18:32:46Z) - Higher Performance Visual Tracking with Dual-Modal Localization [106.91097443275035]
Visual Object Tracking (VOT) has synchronous needs for both robustness and accuracy.
We propose a dual-modal framework for target localization, consisting of robust localization suppressingors via ONR and the accurate localization attending to the target center precisely via OFC.
arXiv Detail & Related papers (2021-03-18T08:47:56Z) - Stopping Criteria for, and Strong Convergence of, Stochastic Gradient
Descent on Bottou-Curtis-Nocedal Functions [0.0]
Stopping criteria for Gradient Descent (SGD) methods are often used to analyse non-size functions.
We develop criteria that can be used to analyse non-size functions and Bottou-Nocedal functions.
As a result of our work, our work can be used to develop new adaptive step schemes.
arXiv Detail & Related papers (2020-04-01T14:44:43Z) - Bottom-Up Temporal Action Localization with Mutual Regularization [107.39785866001868]
State-of-the-art solutions for TAL involve evaluating the frame-level probabilities of three action-indicating phases.
We introduce two regularization terms to mutually regularize the learning procedure.
Experiments are performed on two popular TAL datasets, THUMOS14 and ActivityNet1.3.
arXiv Detail & Related papers (2020-02-18T03:59:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.