Exploiting No-Regret Algorithms in System Design
- URL: http://arxiv.org/abs/2007.11172v2
- Date: Fri, 24 Jul 2020 10:05:25 GMT
- Title: Exploiting No-Regret Algorithms in System Design
- Authors: Le Cong Dinh and Nick Bishop and Long Tran-Thanh
- Abstract summary: We investigate a two-player zero-sum game setting where the column player is also a designer of the system.
The goal of the column player is to guide her opponent to pick a mixed strategy which is favourable for the system designer.
We propose a new game playing algorithm for the system designer and prove that it can guide the row player to converge to a minimax solution.
- Score: 20.189783788006263
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We investigate a repeated two-player zero-sum game setting where the column
player is also a designer of the system, and has full control on the design of
the payoff matrix. In addition, the row player uses a no-regret algorithm to
efficiently learn how to adapt their strategy to the column player's behaviour
over time in order to achieve good total payoff. The goal of the column player
is to guide her opponent to pick a mixed strategy which is favourable for the
system designer. Therefore, she needs to: (i) design an appropriate payoff
matrix $A$ whose unique minimax solution contains the desired mixed strategy of
the row player; and (ii) strategically interact with the row player during a
sequence of plays in order to guide her opponent to converge to that desired
behaviour. To design such a payoff matrix, we propose a novel solution that
provably has a unique minimax solution with the desired behaviour. We also
investigate a relaxation of this problem where uniqueness is not required, but
all the minimax solutions have the same mixed strategy for the row player.
Finally, we propose a new game playing algorithm for the system designer and
prove that it can guide the row player, who may play a \emph{stable} no-regret
algorithm, to converge to a minimax solution.
Related papers
- Distributed Online Bandit Nonconvex Optimization with One-Point Residual Feedback via Dynamic Regret [10.700891331004799]
This paper considers the distributed online bandit optimization problem with non loss functions over a time-varying digraph.
Players select an adversary and then the adversary assigns an arbitrary non-linear loss function to this player.
The expected regret of our algorithms is comparable to existing algorithms that use two-point deviation.
arXiv Detail & Related papers (2024-09-24T02:37:33Z) - Learning to Control Unknown Strongly Monotone Games [16.327788209492805]
We propose a simple algorithm that learns to shift the NE of the game to meet the linear constraints by adjusting the controlled coefficients online.
We prove that our algorithm, which is based on two time-scale approximation, guarantees convergence with probability 1 to the set of NE that meet target linear constraints.
We demonstrate how our scheme can be applied to optimizing a global quadratic cost at NE and load balancing in resource allocation games.
arXiv Detail & Related papers (2024-06-30T03:33:42Z) - Logarithmic Regret for Matrix Games against an Adversary with Noisy
Bandit Feedback [23.97611639885236]
This paper considers a variant of zero-sum matrix games where at each timestep the row player chooses row $i$, the column player chooses column $j$, and the row player receives a noisy reward with mean $A_i,j$.
The objective of the row player is to accumulate as much reward as possible, even against an adversarial column player.
If the row player uses the EXP3 strategy, an algorithm known for obtaining $sqrtT$ regret against an arbitrary sequence of rewards, it is immediate that the row player also achieves $sqrtT
arXiv Detail & Related papers (2023-06-22T22:45:48Z) - No-Regret Learning in Time-Varying Zero-Sum Games [99.86860277006318]
Learning from repeated play in a fixed zero-sum game is a classic problem in game theory and online learning.
We develop a single parameter-free algorithm that simultaneously enjoys favorable guarantees under three performance measures.
Our algorithm is based on a two-layer structure with a meta-algorithm learning over a group of black-box base-learners satisfying a certain property.
arXiv Detail & Related papers (2022-01-30T06:10:04Z) - Towards General Function Approximation in Zero-Sum Markov Games [126.58493169301012]
This paper considers two-player zero-sum finite-horizon Markov games with simultaneous moves.
Provably efficient algorithms for both decoupled and coordinated settings are developed.
arXiv Detail & Related papers (2021-07-30T15:25:13Z) - Almost Optimal Algorithms for Two-player Markov Games with Linear
Function Approximation [92.99933928528797]
We study reinforcement learning for two-player zero-sum Markov games with simultaneous moves.
We propose an algorithm Nash-UCRL-VTR based on the principle "Optimism-in-Face-of-Uncertainty"
We show that Nash-UCRL-VTR can provably achieve an $tildeO(dHsqrtT)$ regret, where $d$ is the linear function dimension.
arXiv Detail & Related papers (2021-02-15T09:09:16Z) - Faster Algorithms for Optimal Ex-Ante Coordinated Collusive Strategies
in Extensive-Form Zero-Sum Games [123.76716667704625]
We focus on the problem of finding an optimal strategy for a team of two players that faces an opponent in an imperfect-information zero-sum extensive-form game.
In that setting, it is known that the best the team can do is sample a profile of potentially randomized strategies (one per player) from a joint (a.k.a. correlated) probability distribution at the beginning of the game.
We provide an algorithm that computes such an optimal distribution by only using profiles where only one of the team members gets to randomize in each profile.
arXiv Detail & Related papers (2020-09-21T17:51:57Z) - Learning to Play Sequential Games versus Unknown Opponents [93.8672371143881]
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action.
We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents.
Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
arXiv Detail & Related papers (2020-07-10T09:33:05Z) - Learning Zero-Sum Simultaneous-Move Markov Games Using Function
Approximation and Correlated Equilibrium [116.56359444619441]
We develop provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games.
In the offline setting, we control both players and aim to find the Nash Equilibrium by minimizing the duality gap.
In the online setting, we control a single player playing against an arbitrary opponent and aim to minimize the regret.
arXiv Detail & Related papers (2020-02-17T17:04:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.