HSVI fo zs-POSGs using Concavity, Convexity and Lipschitz Properties
- URL: http://arxiv.org/abs/2110.14529v1
- Date: Mon, 25 Oct 2021 13:38:21 GMT
- Title: HSVI fo zs-POSGs using Concavity, Convexity and Lipschitz Properties
- Authors: Aur\'elien Delage, Olivier Buffet, Jilles Dibangoye
- Abstract summary: In some cases we evaluate POMDs and Dec-MDs as approaches to the optimal value function.
This approach has succeeded in some partially observable games (s-POSGs) as well, but failed in general case despite known concavity properties.
We derive on these properties to derive prototypical convexity guarantees bounding approximators and efficient update and selection operators.
- Score: 8.80899367147235
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Dynamic programming and heuristic search are at the core of state-of-the-art
solvers for sequential decision-making problems. In partially observable or
collaborative settings (\eg, POMDPs and Dec-POMDPs), this requires introducing
an appropriate statistic that induces a fully observable problem as well as
bounding (convex) approximators of the optimal value function. This approach
has succeeded in some subclasses of 2-player zero-sum partially observable
stochastic games (zs-POSGs) as well, but failed in the general case despite
known concavity and convexity properties, which only led to heuristic
algorithms with poor convergence guarantees. We overcome this issue, leveraging
on these properties to derive bounding approximators and efficient update and
selection operators, before deriving a prototypical solver inspired by HSVI
that provably converges to an $\epsilon$-optimal solution in finite time, and
which we empirically evaluate. This opens the door to a novel family of
promising approaches complementing those relying on linear programming or
iterative methods.
Related papers
- $ε$-Optimally Solving Zero-Sum POSGs [4.424170214926035]
A recent method for solving zero-sum partially observable games embeds the original game into a new one called the occupancy Markov game.
This paper exploits the optimal value function's novel uniform continuity properties to overcome this limitation.
arXiv Detail & Related papers (2024-05-29T08:34:01Z) - High-Probability Bounds for Stochastic Optimization and Variational
Inequalities: the Case of Unbounded Variance [59.211456992422136]
We propose algorithms with high-probability convergence results under less restrictive assumptions.
These results justify the usage of the considered methods for solving problems that do not fit standard functional classes in optimization.
arXiv Detail & Related papers (2023-02-02T10:37:23Z) - Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation
Constrained Optimization [88.0031283949404]
Many real-world problems have complicated non functional constraints and use a large number of data points.
Our proposed method outperforms an existing method with the previously best-known result.
arXiv Detail & Related papers (2022-12-19T14:48:54Z) - HSVI can solve zero-sum Partially Observable Stochastic Games [7.293053431456775]
State-of-the-art methods for solving 2-player zero-sum imperfect games rely on linear programming or dynamic regret minimization.
We propose a novel family of promising approaches complementing those relying on linear programming or iterative methods.
arXiv Detail & Related papers (2022-10-26T11:41:57Z) - Efficient semidefinite bounds for multi-label discrete graphical models [6.226454551201676]
One of the main queries on such models is to identify the SDPWCSP Function on Cost of a Posteri (MAP) Networks.
We consider a traditional dualized constraint approach and a dedicated dedicated SDP/Monteiro style method based on row-by-row updates.
arXiv Detail & Related papers (2021-11-24T13:38:34Z) - Faster Algorithm and Sharper Analysis for Constrained Markov Decision
Process [56.55075925645864]
The problem of constrained decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints.
A new utilities-dual convex approach is proposed with novel integration of three ingredients: regularized policy, dual regularizer, and Nesterov's gradient descent dual.
This is the first demonstration that nonconcave CMDP problems can attain the lower bound of $mathcal O (1/epsilon)$ for all complexity optimization subject to convex constraints.
arXiv Detail & Related papers (2021-10-20T02:57:21Z) - Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth
Games: Convergence Analysis under Expected Co-coercivity [49.66890309455787]
We introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO.
We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size.
Our convergence guarantees hold under the arbitrary sampling paradigm, and we give insights into the complexity of minibatching.
arXiv Detail & Related papers (2021-06-30T18:32:46Z) - Linear-Time Probabilistic Solutions of Boundary Value Problems [27.70274403550477]
We introduce a Gauss--Markov prior and tailor it specifically to BVPs.
This allows computing a posterior distribution over the solution in linear time, at a quality and cost comparable to that of well-established, non-probabilistic methods.
arXiv Detail & Related papers (2021-06-14T21:19:17Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Parallel Stochastic Mirror Descent for MDPs [72.75921150912556]
We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs)
Some variant of Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals.
We analyze this algorithm in a general case and obtain an estimate of the convergence rate that does not accumulate errors during the operation of the method.
arXiv Detail & Related papers (2021-02-27T19:28:39Z) - Provably Convergent Working Set Algorithm for Non-Convex Regularized
Regression [0.0]
This paper proposes a working set algorithm for non-regular regularizers with convergence guarantees.
Our results demonstrate high gain compared to the full problem solver for both block-coordinates or a gradient solver.
arXiv Detail & Related papers (2020-06-24T07:40:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.