Related papers: No-regret learning for repeated non-cooperative games with lossy bandits

No-regret learning for repeated non-cooperative games with lossy bandits

URL: http://arxiv.org/abs/2205.06968v1
Date: Sat, 14 May 2022 05:02:56 GMT
Title: No-regret learning for repeated non-cooperative games with lossy bandits
Authors: Wenting Liu, Jinlong Lei, Peng Yi, Yiguang Hong
Abstract summary: We study the asynchronous online learning strategy of the players to adaptively adjust the next actions for minimizing the long-term regret loss. The paper provides a novel no-regret learning algorithm, called Online Gradient Descent with lossy bandits (OGD-lb)
Score: 5.437709019435926
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper considers no-regret learning for repeated continuous-kernel games with lossy bandit feedback. Since it is difficult to give the explicit model of the utility functions in dynamic environments, the players' action can only be learned with bandit feedback. Moreover, because of unreliable communication channels or privacy protection, the bandit feedback may be lost or dropped at random. Therefore, we study the asynchronous online learning strategy of the players to adaptively adjust the next actions for minimizing the long-term regret loss. The paper provides a novel no-regret learning algorithm, called Online Gradient Descent with lossy bandits (OGD-lb). We first give the regret analysis for concave games with differentiable and Lipschitz utilities. Then we show that the action profile converges to a Nash equilibrium with probability 1 when the game is also strictly monotone. We further provide the mean square convergence rate $\mathcal{O}\left(k^{-2\min\{\beta, 1/6\}}\right)$ when the game is $\beta-$ strongly monotone. In addition, we extend the algorithm to the case when the loss probability of the bandit feedback is unknown, and prove its almost sure convergence to Nash equilibrium for strictly monotone games. Finally, we take the resource management in fog computing as an application example, and carry out numerical experiments to empirically demonstrate the algorithm performance.

Related papers

Instance-Dependent Regret Bounds for Learning Two-Player Zero-Sum Games with Bandit Feedback [60.610120215789976]
We show that when a pure strategy Nash equilibrium exists, $c$ becomes zero, leading to an optimal instance-dependent regret bound. Our algorithm also enjoys last-iterate convergence and can identify the pure strategy Nash equilibrium with near-optimal sample.
arXiv Detail & Related papers (2025-02-24T20:20:06Z)
No Algorithmic Collusion in Two-Player Blindfolded Game with Thompson Sampling [10.376707874029561]
We show that when the players use Thompson sampling, the game dynamics converge to the Nash equilibrium under a mild assumption on the payoff matrices. algorithmic collusion doesn't arise in this case despite the fact that the players do not intentionally deploy competitive strategies.
arXiv Detail & Related papers (2024-05-23T08:21:48Z)
Convergence to Nash Equilibrium and No-regret Guarantee in (Markov) Potential Games [27.52590585111178]
We study potential games and Markov potential games under cost and bandit feedback. Our algorithm simultaneously achieves a Nash regret and a regret bound of $O(T4/5)$ for potential games.
arXiv Detail & Related papers (2024-04-04T01:02:03Z)
Is Learning in Games Good for the Learners? [14.781100349601587]
We consider tradeoffs between reward and regret in repeated gameplay between two agents. We show that any such equilibrium is reachable by a pair of algorithms which maintain their regret guarantees against arbitrary opponents. We also consider the question of learning reward-optimal strategies via repeated play with a no-regret agent when game is initially unknown.
arXiv Detail & Related papers (2023-05-31T02:10:27Z)
Regret Matching+: (In)Stability and Fast Convergence in Games [68.13214224119024]
We show that RM+ and its predictive version can be unstable, which might cause other players to suffer large regret. We show that these fixes are sufficient to get $O(T1/4)$ individual regret and $O(1)$ social regret in normal-form games via RM+ with predictions.
arXiv Detail & Related papers (2023-05-24T04:26:21Z)
Bandit Social Learning: Exploration under Myopic Behavior [58.75758600464338]
We study social learning dynamics motivated by reviews on online platforms. Agents collectively follow a simple multi-armed bandit protocol, but each agent acts myopically, without regards to exploration. We derive stark learning failures for any such behavior, and provide matching positive results.
arXiv Detail & Related papers (2023-02-15T01:57:57Z)
Finding mixed-strategy equilibria of continuous-action games without gradients using randomized policy networks [83.28949556413717]
We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients. We model players' strategies using artificial neural networks. This paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.
arXiv Detail & Related papers (2022-11-29T05:16:41Z)
Offline congestion games: How feedback type affects data coverage requirement [53.83345471268163]
We consider three different types of feedback with decreasing revealed information. We show the coverage assumption for the agent-level feedback setting is insufficient in the game-level feedback setting. Our results constitute the first study of offline congestion games.
arXiv Detail & Related papers (2022-10-24T16:49:16Z)
Complete Policy Regret Bounds for Tallying Bandits [51.039677652803675]
Policy regret is a well established notion of measuring the performance of an online learning algorithm against an adaptive adversary. We study restrictions on the adversary that enable efficient minimization of the emphcomplete policy regret We provide an algorithm that w.h.p a complete policy regret guarantee of $tildemathcalO(mKsqrtT)$, where the $tildemathcalO$ notation hides only logarithmic factors.
arXiv Detail & Related papers (2022-04-24T03:10:27Z)
Nonstochastic Bandits with Composite Anonymous Feedback [41.38921728211769]
We investigate a nonstochastic bandit setting in which the loss of an action is not immediately charged to the player. The instantaneous loss observed by the player at the end of each round is then a sum of many loss components of previously played actions.
arXiv Detail & Related papers (2021-12-06T08:44:04Z)
Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback [29.553652241608997]
We study the class of textitsmooth and strongly monotone games and study optimal no-regret learning therein. We first construct a new bandit learning algorithm and show that it achieves the single-agent optimal regret of $tildeTheta(nsqrtT)$. Our results thus settle this open problem and contribute to the broad landscape of bandit game-theoretical learning.
arXiv Detail & Related papers (2021-12-06T08:27:54Z)
Online Markov Decision Processes with Aggregate Bandit Feedback [74.85532145498742]
We study a novel variant of online finite-horizon Markov Decision Processes with adversarially changing loss functions and initially unknown dynamics. In each episode, the learner suffers the loss accumulated along the trajectory realized by the policy chosen for the episode, and observes aggregate bandit feedback. Our main result is a computationally efficient algorithm with $O(sqrtK)$ regret for this setting, where $K$ is the number of episodes.
arXiv Detail & Related papers (2021-01-31T16:49:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.