Optimal control of robust team stochastic games
- URL: http://arxiv.org/abs/2105.07405v1
- Date: Sun, 16 May 2021 10:42:09 GMT
- Title: Optimal control of robust team stochastic games
- Authors: Feng Huang, Ming Cao, and Long Wang
- Abstract summary: We propose a model of "robust" team games, where players utilize a robust optimization approach to make decisions.
We develop a learning algorithm in the form of a Gauss-Seidel modified policy iteration and prove its convergence.
Some numerical simulations are presented to demonstrate the effectiveness of the algorithm.
- Score: 5.425935258756356
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In stochastic dynamic environments, team stochastic games have emerged as a
versatile paradigm for studying sequential decision-making problems of fully
cooperative multi-agent systems. However, the optimality of the derived
policies is usually sensitive to the model parameters, which are typically
unknown and required to be estimated from noisy data in practice. To mitigate
the sensitivity of the optimal policy to these uncertain parameters, in this
paper, we propose a model of "robust" team stochastic games, where players
utilize a robust optimization approach to make decisions. This model extends
team stochastic games to the scenario of incomplete information and meanwhile
provides an alternative solution concept of robust team optimality. To seek
such a solution, we develop a learning algorithm in the form of a Gauss-Seidel
modified policy iteration and prove its convergence. This algorithm, compared
with robust dynamic programming, not only possesses a faster convergence rate,
but also allows for using approximation calculations to alleviate the curse of
dimensionality. Moreover, some numerical simulations are presented to
demonstrate the effectiveness of the algorithm by generalizing the game model
of social dilemmas to sequential robust scenarios.
Related papers
- Optimization and Optimizers for Adversarial Robustness [10.279287131070157]
In this paper, we introduce a novel framework that blends a general-purpose constrained-optimization solver with Constraint Folding.
Regarding reliability, PWCF provides solutions with stationarity measures and feasibility tests to assess the solution quality.
We further explore the distinct patterns in the solutions found for solving these problems using various combinations of losses, perturbation models, and optimization algorithms.
arXiv Detail & Related papers (2023-03-23T16:22:59Z) - Backpropagation of Unrolled Solvers with Folded Optimization [55.04219793298687]
The integration of constrained optimization models as components in deep networks has led to promising advances on many specialized learning tasks.
One typical strategy is algorithm unrolling, which relies on automatic differentiation through the operations of an iterative solver.
This paper provides theoretical insights into the backward pass of unrolled optimization, leading to a system for generating efficiently solvable analytical models of backpropagation.
arXiv Detail & Related papers (2023-01-28T01:50:42Z) - The Parametric Cost Function Approximation: A new approach for
multistage stochastic programming [4.847980206213335]
We show that a parameterized version of a deterministic optimization model can be an effective way of handling uncertainty without the complexity of either programming or dynamic programming.
This approach can handle complex, high-dimensional state variables, and avoids the usual approximations associated with scenario trees or value function approximations.
arXiv Detail & Related papers (2022-01-01T23:25:09Z) - Sample Complexity of Robust Reinforcement Learning with a Generative
Model [0.0]
We propose a model-based reinforcement learning (RL) algorithm for learning an $epsilon$-optimal robust policy.
We consider three different forms of uncertainty sets, characterized by the total variation distance, chi-square divergence, and KL divergence.
In addition to the sample complexity results, we also present a formal analytical argument on the benefit of using robust policies.
arXiv Detail & Related papers (2021-12-02T18:55:51Z) - Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks.
We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution.
We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z) - Robust Value Iteration for Continuous Control Tasks [99.00362538261972]
When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well.
We present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain.
We show that robust value is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm.
arXiv Detail & Related papers (2021-05-25T19:48:35Z) - Modeling the Second Player in Distributionally Robust Optimization [90.25995710696425]
We argue for the use of neural generative models to characterize the worst-case distribution.
This approach poses a number of implementation and optimization challenges.
We find that the proposed approach yields models that are more robust than comparable baselines.
arXiv Detail & Related papers (2021-03-18T14:26:26Z) - Robust, Accurate Stochastic Optimization for Variational Inference [68.83746081733464]
We show that common optimization methods lead to poor variational approximations if the problem is moderately large.
Motivated by these findings, we develop a more robust and accurate optimization framework by viewing the underlying algorithm as producing a Markov chain.
arXiv Detail & Related papers (2020-09-01T19:12:11Z) - Uncertainty Modelling in Risk-averse Supply Chain Systems Using
Multi-objective Pareto Optimization [0.0]
One of the arduous tasks in supply chain modelling is to build robust models against irregular variations.
We have introduced a novel methodology namely, Pareto Optimization to handle uncertainties and bound the entropy of such uncertainties by explicitly modelling them under some apriori assumptions.
arXiv Detail & Related papers (2020-04-24T21:04:25Z) - Decentralized MCTS via Learned Teammate Models [89.24858306636816]
We present a trainable online decentralized planning algorithm based on decentralized Monte Carlo Tree Search.
We show that deep learning and convolutional neural networks can be employed to produce accurate policy approximators.
arXiv Detail & Related papers (2020-03-19T13:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.