$ε$-Optimally Solving Zero-Sum POSGs
- URL: http://arxiv.org/abs/2406.00054v1
- Date: Wed, 29 May 2024 08:34:01 GMT
- Title: $ε$-Optimally Solving Zero-Sum POSGs
- Authors: Erwan Escudie, Matthia Sabatelli, Jilles Dibangoye,
- Abstract summary: A recent method for solving zero-sum partially observable games embeds the original game into a new one called the occupancy Markov game.
This paper exploits the optimal value function's novel uniform continuity properties to overcome this limitation.
- Score: 4.424170214926035
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A recent method for solving zero-sum partially observable stochastic games (zs-POSGs) embeds the original game into a new one called the occupancy Markov game. This reformulation allows applying Bellman's principle of optimality to solve zs-POSGs. However, improving a current solution requires solving a linear program with exponentially many potential constraints, which significantly restricts the scalability of this approach. This paper exploits the optimal value function's novel uniform continuity properties to overcome this limitation. We first construct a new operator that is computationally more efficient than the state-of-the-art update rules without compromising optimality. In particular, improving a current solution now involves a linear program with an exponential drop in constraints. We then also show that point-based value iteration algorithms utilizing our findings improve the scalability of existing methods while maintaining guarantees in various domains.
Related papers
- The Differentiable Feasibility Pump [49.55771920271201]
This paper shows that the traditional feasibility pump and many of its follow-ups can be seen as gradient-descent algorithms with specific parameters.
A central aspect of this reinterpretation is observing that the traditional algorithm differentiates the solution of the linear relaxation with respect to its cost.
arXiv Detail & Related papers (2024-11-05T22:26:51Z) - A Penalty-Based Guardrail Algorithm for Non-Decreasing Optimization with Inequality Constraints [1.5498250598583487]
Traditional mathematical programming solvers require long computational times to solve constrained minimization problems.
We propose a penalty-based guardrail algorithm (PGA) to efficiently solve them.
arXiv Detail & Related papers (2024-05-03T10:37:34Z) - Improving Sample Efficiency of Model-Free Algorithms for Zero-Sum Markov Games [66.2085181793014]
We show that a model-free stage-based Q-learning algorithm can enjoy the same optimality in the $H$ dependence as model-based algorithms.
Our algorithm features a key novel design of updating the reference value functions as the pair of optimistic and pessimistic value functions.
arXiv Detail & Related papers (2023-08-17T08:34:58Z) - Strictly Low Rank Constraint Optimization -- An Asymptotically
$\mathcal{O}(\frac{1}{t^2})$ Method [5.770309971945476]
We propose a class of non-text and non-smooth problems with textitrank regularization to promote sparsity in optimal solution.
We show that our algorithms are able to achieve a singular convergence of $Ofrac(t2)$, which is exactly same as Nesterov's optimal convergence for first-order methods on smooth convex problems.
arXiv Detail & Related papers (2023-07-04T16:55:41Z) - Optimizing NOTEARS Objectives via Topological Swaps [41.18829644248979]
In this paper, we develop an effective method for a set of candidate algorithms.
At the inner level, given a subject, we utilize off-the-the-art constraints.
The proposed method significantly improves the score of other algorithms.
arXiv Detail & Related papers (2023-05-26T21:49:37Z) - Accelerated First-Order Optimization under Nonlinear Constraints [73.2273449996098]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms.
An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z) - Learning to Optimize with Stochastic Dominance Constraints [103.26714928625582]
In this paper, we develop a simple yet efficient approach for the problem of comparing uncertain quantities.
We recast inner optimization in the Lagrangian as a learning problem for surrogate approximation, which bypasses apparent intractability.
The proposed light-SD demonstrates superior performance on several representative problems ranging from finance to supply chain management.
arXiv Detail & Related papers (2022-11-14T21:54:31Z) - Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time
Guarantees [56.848265937921354]
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy.
Many algorithms for IRL have an inherently nested structure.
We develop a novel single-loop algorithm for IRL that does not compromise reward estimation accuracy.
arXiv Detail & Related papers (2022-10-04T17:13:45Z) - A framework for bilevel optimization that enables stochastic and global
variance reduction algorithms [17.12280360174073]
Bilevel optimization is a problem of minimizing a value function which involves the arg-minimum of another function.
We introduce a novel framework, in which the solution of the inner problem, the solution of the linear system, and the main variable evolve at the same time.
We demonstrate that SABA, an adaptation of the celebrated SAGA algorithm in our framework, has $O(frac1T)$ convergence rate, and that it achieves linear convergence under Polyak-Lojasciewicz assumption.
arXiv Detail & Related papers (2022-01-31T18:17:25Z) - HSVI fo zs-POSGs using Concavity, Convexity and Lipschitz Properties [8.80899367147235]
In some cases we evaluate POMDs and Dec-MDs as approaches to the optimal value function.
This approach has succeeded in some partially observable games (s-POSGs) as well, but failed in general case despite known concavity properties.
We derive on these properties to derive prototypical convexity guarantees bounding approximators and efficient update and selection operators.
arXiv Detail & Related papers (2021-10-25T13:38:21Z) - An Asymptotically Optimal Primal-Dual Incremental Algorithm for
Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions.
We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.