Equilibrium Bandits: Learning Optimal Equilibria of Unknown Dynamics
- URL: http://arxiv.org/abs/2302.13653v1
- Date: Mon, 27 Feb 2023 10:47:15 GMT
- Title: Equilibrium Bandits: Learning Optimal Equilibria of Unknown Dynamics
- Authors: Siddharth Chandak, Ilai Bistritz, Nicholas Bambos
- Abstract summary: Consider a decision-maker that can pick one out of $K$ actions to control an unknown system, for $T$ turns.
The dynamics of the system are unknown to the decision-maker, which can only observe a noisy reward at the end of every turn.
Existing bandit algorithms, either adversarial, achieve linear (tau) regret for this problem.
We present a novel algorithm that knows to switch an action quickly if it is not worth it to wait until the equilibrium is reached.
- Score: 23.722837647516357
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Consider a decision-maker that can pick one out of $K$ actions to control an
unknown system, for $T$ turns. The actions are interpreted as different
configurations or policies. Holding the same action fixed, the system
asymptotically converges to a unique equilibrium, as a function of this action.
The dynamics of the system are unknown to the decision-maker, which can only
observe a noisy reward at the end of every turn. The decision-maker wants to
maximize its accumulated reward over the $T$ turns. Learning what equilibria
are better results in higher rewards, but waiting for the system to converge to
equilibrium costs valuable time. Existing bandit algorithms, either stochastic
or adversarial, achieve linear (trivial) regret for this problem. We present a
novel algorithm, termed Upper Equilibrium Concentration Bound (UECB), that
knows to switch an action quickly if it is not worth it to wait until the
equilibrium is reached. This is enabled by employing convergence bounds to
determine how far the system is from equilibrium. We prove that UECB achieves a
regret of $\mathcal{O}(\log(T)+\tau_c\log(\tau_c)+\tau_c\log\log(T))$ for this
equilibrium bandit problem where $\tau_c$ is the worst case approximate
convergence time to equilibrium. We then show that both epidemic control and
game control are special cases of equilibrium bandits, where $\tau_c\log
\tau_c$ typically dominates the regret. We then test UECB numerically for both
of these applications.
Related papers
- Computational Lower Bounds for Regret Minimization in Normal-Form Games [68.66209476382213]
We provide evidence that existing learning algorithms, such as multiplicative weights update, are close to optimal.
Our results are obtained in the algorithmic framework put forward by Kothari and Mehta.
arXiv Detail & Related papers (2024-11-04T00:39:52Z) - Games played by Exponential Weights Algorithms [0.0]
We consider a repeated interaction in discrete time, where each player uses an exponential weights algorithm characterized by an initial mixed action and a fixed learning rate.
We show that whenever a strict Nash equilibrium exists, the probability to play a strict Nash equilibrium at the next stage converges almost surely to 0 or 1.
arXiv Detail & Related papers (2024-07-09T08:49:51Z) - Variance-Dependent Regret Bounds for Non-stationary Linear Bandits [52.872628573907434]
We propose algorithms that utilize the variance of the reward distribution as well as the $B_K$, and show that they can achieve tighter regret upper bounds.
We introduce two novel algorithms: Restarted Weighted$textOFUL+$ and Restarted $textSAVE+$.
Notably, when the total variance $V_K$ is much smaller than $K$, our algorithms outperform previous state-of-the-art results on non-stationary linear bandits under different settings.
arXiv Detail & Related papers (2024-03-15T23:36:55Z) - On Tractable $Φ$-Equilibria in Non-Concave Games [53.212133025684224]
Non-concave games introduce significant game-theoretic and optimization challenges.
We show that when $Phi$ is finite, there exists an efficient uncoupled learning algorithm that converges to the corresponding $Phi$-equilibria.
We also show that Online Gradient Descent can efficiently approximate $Phi$-equilibria in non-trivial regimes.
arXiv Detail & Related papers (2024-03-13T01:51:30Z) - A Black-box Approach for Non-stationary Multi-agent Reinforcement Learning [53.83345471268163]
We investigate learning the equilibria in non-stationary multi-agent systems.
We show how to test for various types of equilibria by a black-box reduction to single-agent learning.
arXiv Detail & Related papers (2023-06-12T23:48:24Z) - Local Convergence of Gradient Methods for Min-Max Games: Partial
Curvature Generically Suffices [0.5439020425819]
We study the convergence to local Nash equilibria of gradient methods for two-player zero-sum differentiable games.
We show that thanks to partial curvature, conic particle methods -- which optimize over both weights and supports of the mixed strategies -- generically converge faster than fixed-support methods.
arXiv Detail & Related papers (2023-05-26T21:43:56Z) - Bayes correlated equilibria and no-regret dynamics [9.89901717499058]
This paper explores equilibrium concepts for Bayesian games, which are fundamental models of games with incomplete information.
We focus on communication equilibria, which can be realized by a mediator who gathers each player's private information and then sends correlated recommendations to the players.
We present an efficient algorithm for minimizing untruthful swap regret with a sublinear upper bound, which we prove to be tight up to a multiplicative constant.
arXiv Detail & Related papers (2023-04-11T06:22:51Z) - How Bad is Selfish Driving? Bounding the Inefficiency of Equilibria in
Urban Driving Games [64.71476526716668]
We study the (in)efficiency of any equilibrium players might agree to play.
We obtain guarantees that refine existing bounds on the Price of Anarchy.
Although the obtained guarantees concern open-loop trajectories, we observe efficient equilibria even when agents employ closed-loop policies.
arXiv Detail & Related papers (2022-10-24T09:32:40Z) - Safe Equilibrium [1.7132914341329848]
The standard game-theoretic solution concept, Nash equilibrium, assumes that all players behave rationally.
We propose a new solution concept called safe equilibrium that models opponents as behaving rationally with a specified probability.
We prove that a safe equilibrium exists in all strategic-form games, and prove that its computation is PPAD-hard.
arXiv Detail & Related papers (2022-01-12T01:45:51Z) - Multi-Leader Congestion Games with an Adversary [0.5914780964919123]
We study a multi-leader single-follower congestion game where multiple users (leaders) choose one resource out of a set of resources.
We show that pure Nash equilibria may fail to exist and therefore, we consider approximate equilibria instead.
We show how to compute efficiently a best approximate equilibrium, that is, with smallest possible $alpha$ among all $alpha$-approximate equilibria of the given instance.
arXiv Detail & Related papers (2021-12-14T14:47:43Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.