HSVI can solve zero-sum Partially Observable Stochastic Games
- URL: http://arxiv.org/abs/2210.14640v1
- Date: Wed, 26 Oct 2022 11:41:57 GMT
- Title: HSVI can solve zero-sum Partially Observable Stochastic Games
- Authors: Aur\'elien Delage, Olivier Buffet, Jilles S. Dibangoye, Abdallah
Saffidine
- Abstract summary: State-of-the-art methods for solving 2-player zero-sum imperfect games rely on linear programming or dynamic regret minimization.
We propose a novel family of promising approaches complementing those relying on linear programming or iterative methods.
- Score: 7.293053431456775
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: State-of-the-art methods for solving 2-player zero-sum imperfect information
games rely on linear programming or regret minimization, though not on dynamic
programming (DP) or heuristic search (HS), while the latter are often at the
core of state-of-the-art solvers for other sequential decision-making problems.
In partially observable or collaborative settings (e.g., POMDPs and Dec-
POMDPs), DP and HS require introducing an appropriate statistic that induces a
fully observable problem as well as bounding (convex) approximators of the
optimal value function. This approach has succeeded in some subclasses of
2-player zero-sum partially observable stochastic games (zs- POSGs) as well,
but how to apply it in the general case still remains an open question. We
answer it by (i) rigorously defining an equivalent game to work with, (ii)
proving mathematical properties of the optimal value function that allow
deriving bounds that come with solution strategies, (iii) proposing for the
first time an HSVI-like solver that provably converges to an $\epsilon$-optimal
solution in finite time, and (iv) empirically analyzing it. This opens the door
to a novel family of promising approaches complementing those relying on linear
programming or iterative methods.
Related papers
- $ε$-Optimally Solving Zero-Sum POSGs [4.424170214926035]
A recent method for solving zero-sum partially observable games embeds the original game into a new one called the occupancy Markov game.
This paper exploits the optimal value function's novel uniform continuity properties to overcome this limitation.
arXiv Detail & Related papers (2024-05-29T08:34:01Z) - Sarah Frank-Wolfe: Methods for Constrained Optimization with Best Rates and Practical Features [65.64276393443346]
The Frank-Wolfe (FW) method is a popular approach for solving optimization problems with structured constraints.
We present two new variants of the algorithms for minimization of the finite-sum gradient.
arXiv Detail & Related papers (2023-04-23T20:05:09Z) - Global Nash Equilibrium in Non-convex Multi-player Game: Theory and
Algorithms [66.8634598612777]
We show that Nash equilibrium (NE) is acceptable to all players in a multi-player game.
We also show that no one can benefit unilaterally from the general theory step by step.
arXiv Detail & Related papers (2023-01-19T11:36:50Z) - Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation
Constrained Optimization [88.0031283949404]
Many real-world problems have complicated non functional constraints and use a large number of data points.
Our proposed method outperforms an existing method with the previously best-known result.
arXiv Detail & Related papers (2022-12-19T14:48:54Z) - HSVI fo zs-POSGs using Concavity, Convexity and Lipschitz Properties [8.80899367147235]
In some cases we evaluate POMDs and Dec-MDs as approaches to the optimal value function.
This approach has succeeded in some partially observable games (s-POSGs) as well, but failed in general case despite known concavity properties.
We derive on these properties to derive prototypical convexity guarantees bounding approximators and efficient update and selection operators.
arXiv Detail & Related papers (2021-10-25T13:38:21Z) - Faster Algorithm and Sharper Analysis for Constrained Markov Decision
Process [56.55075925645864]
The problem of constrained decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints.
A new utilities-dual convex approach is proposed with novel integration of three ingredients: regularized policy, dual regularizer, and Nesterov's gradient descent dual.
This is the first demonstration that nonconcave CMDP problems can attain the lower bound of $mathcal O (1/epsilon)$ for all complexity optimization subject to convex constraints.
arXiv Detail & Related papers (2021-10-20T02:57:21Z) - Parallel Stochastic Mirror Descent for MDPs [72.75921150912556]
We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs)
Some variant of Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals.
We analyze this algorithm in a general case and obtain an estimate of the convergence rate that does not accumulate errors during the operation of the method.
arXiv Detail & Related papers (2021-02-27T19:28:39Z) - Provably Efficient Policy Gradient Methods for Two-Player Zero-Sum
Markov Games [95.70078702838654]
This paper studies natural extensions of Natural Policy Gradient algorithm for solving two-player zero-sum games.
We thoroughly characterize the algorithms' performance in terms of the number of samples, number of iterations, concentrability coefficients, and approximation error.
arXiv Detail & Related papers (2021-02-17T17:49:57Z) - On Satisficing in Quantitative Games [30.53498001438171]
This work defines and investigates the satisficing problem on a two-player graph game with the discounted-sum cost model.
We show that while the satisficing problem can be solved using numerical methods just like the optimization problem, this approach does not render compelling benefits over optimization.
arXiv Detail & Related papers (2021-01-06T07:47:13Z) - Provably Convergent Working Set Algorithm for Non-Convex Regularized
Regression [0.0]
This paper proposes a working set algorithm for non-regular regularizers with convergence guarantees.
Our results demonstrate high gain compared to the full problem solver for both block-coordinates or a gradient solver.
arXiv Detail & Related papers (2020-06-24T07:40:31Z) - On the Impossibility of Global Convergence in Multi-Loss Optimization [10.081768261298098]
We prove that desirable convergence properties cannot simultaneously hold for any algorithm.
Our result has more to do with the existence of games with no satisfactory outcomes, than with algorithms per se.
It remains an open question whether such behavior can arise in high-dimensional games of interest to ML practitioners.
arXiv Detail & Related papers (2020-05-26T12:11:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.