No-regret learning for repeated non-cooperative games with lossy bandits
- URL: http://arxiv.org/abs/2205.06968v1
- Date: Sat, 14 May 2022 05:02:56 GMT
- Title: No-regret learning for repeated non-cooperative games with lossy bandits
- Authors: Wenting Liu, Jinlong Lei, Peng Yi, Yiguang Hong
- Abstract summary: We study the asynchronous online learning strategy of the players to adaptively adjust the next actions for minimizing the long-term regret loss.
The paper provides a novel no-regret learning algorithm, called Online Gradient Descent with lossy bandits (OGD-lb)
- Score: 5.437709019435926
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper considers no-regret learning for repeated continuous-kernel games
with lossy bandit feedback. Since it is difficult to give the explicit model of
the utility functions in dynamic environments, the players' action can only be
learned with bandit feedback. Moreover, because of unreliable communication
channels or privacy protection, the bandit feedback may be lost or dropped at
random. Therefore, we study the asynchronous online learning strategy of the
players to adaptively adjust the next actions for minimizing the long-term
regret loss. The paper provides a novel no-regret learning algorithm, called
Online Gradient Descent with lossy bandits (OGD-lb). We first give the regret
analysis for concave games with differentiable and Lipschitz utilities. Then we
show that the action profile converges to a Nash equilibrium with probability 1
when the game is also strictly monotone. We further provide the mean square
convergence rate $\mathcal{O}\left(k^{-2\min\{\beta, 1/6\}}\right)$ when the
game is $\beta-$ strongly monotone. In addition, we extend the algorithm to the
case when the loss probability of the bandit feedback is unknown, and prove its
almost sure convergence to Nash equilibrium for strictly monotone games.
Finally, we take the resource management in fog computing as an application
example, and carry out numerical experiments to empirically demonstrate the
algorithm performance.
Related papers
- No Algorithmic Collusion in Two-Player Blindfolded Game with Thompson Sampling [10.376707874029561]
We show that when the players use Thompson sampling, the game dynamics converge to the Nash equilibrium under a mild assumption on the payoff matrices.
algorithmic collusion doesn't arise in this case despite the fact that the players do not intentionally deploy competitive strategies.
arXiv Detail & Related papers (2024-05-23T08:21:48Z) - Convergence to Nash Equilibrium and No-regret Guarantee in (Markov) Potential Games [27.52590585111178]
We study potential games and Markov potential games under cost and bandit feedback.
Our algorithm simultaneously achieves a Nash regret and a regret bound of $O(T4/5)$ for potential games.
arXiv Detail & Related papers (2024-04-04T01:02:03Z) - Is Learning in Games Good for the Learners? [14.781100349601587]
We consider tradeoffs between reward and regret in repeated gameplay between two agents.
We show that any such equilibrium is reachable by a pair of algorithms which maintain their regret guarantees against arbitrary opponents.
We also consider the question of learning reward-optimal strategies via repeated play with a no-regret agent when game is initially unknown.
arXiv Detail & Related papers (2023-05-31T02:10:27Z) - Regret Matching+: (In)Stability and Fast Convergence in Games [68.13214224119024]
We show that RM+ and its predictive version can be unstable, which might cause other players to suffer large regret.
We show that these fixes are sufficient to get $O(T1/4)$ individual regret and $O(1)$ social regret in normal-form games via RM+ with predictions.
arXiv Detail & Related papers (2023-05-24T04:26:21Z) - Bandit Social Learning: Exploration under Myopic Behavior [58.75758600464338]
We study social learning dynamics motivated by reviews on online platforms.
Agents collectively follow a simple multi-armed bandit protocol, but each agent acts myopically, without regards to exploration.
We derive stark learning failures for any such behavior, and provide matching positive results.
arXiv Detail & Related papers (2023-02-15T01:57:57Z) - Finding mixed-strategy equilibria of continuous-action games without
gradients using randomized policy networks [83.28949556413717]
We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients.
We model players' strategies using artificial neural networks.
This paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.
arXiv Detail & Related papers (2022-11-29T05:16:41Z) - Offline congestion games: How feedback type affects data coverage requirement [53.83345471268163]
We consider three different types of feedback with decreasing revealed information.
We show the coverage assumption for the agent-level feedback setting is insufficient in the game-level feedback setting.
Our results constitute the first study of offline congestion games.
arXiv Detail & Related papers (2022-10-24T16:49:16Z) - Complete Policy Regret Bounds for Tallying Bandits [51.039677652803675]
Policy regret is a well established notion of measuring the performance of an online learning algorithm against an adaptive adversary.
We study restrictions on the adversary that enable efficient minimization of the emphcomplete policy regret
We provide an algorithm that w.h.p a complete policy regret guarantee of $tildemathcalO(mKsqrtT)$, where the $tildemathcalO$ notation hides only logarithmic factors.
arXiv Detail & Related papers (2022-04-24T03:10:27Z) - Nonstochastic Bandits with Composite Anonymous Feedback [41.38921728211769]
We investigate a nonstochastic bandit setting in which the loss of an action is not immediately charged to the player.
The instantaneous loss observed by the player at the end of each round is then a sum of many loss components of previously played actions.
arXiv Detail & Related papers (2021-12-06T08:44:04Z) - Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback [29.553652241608997]
We study the class of textitsmooth and strongly monotone games and study optimal no-regret learning therein.
We first construct a new bandit learning algorithm and show that it achieves the single-agent optimal regret of $tildeTheta(nsqrtT)$.
Our results thus settle this open problem and contribute to the broad landscape of bandit game-theoretical learning.
arXiv Detail & Related papers (2021-12-06T08:27:54Z) - Online Markov Decision Processes with Aggregate Bandit Feedback [74.85532145498742]
We study a novel variant of online finite-horizon Markov Decision Processes with adversarially changing loss functions and initially unknown dynamics.
In each episode, the learner suffers the loss accumulated along the trajectory realized by the policy chosen for the episode, and observes aggregate bandit feedback.
Our main result is a computationally efficient algorithm with $O(sqrtK)$ regret for this setting, where $K$ is the number of episodes.
arXiv Detail & Related papers (2021-01-31T16:49:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.