Offline congestion games: How feedback type affects data coverage requirement
- URL: http://arxiv.org/abs/2210.13396v2
- Date: Fri, 04 Oct 2024 07:27:45 GMT
- Title: Offline congestion games: How feedback type affects data coverage requirement
- Authors: Haozhe Jiang, Qiwen Cui, Zhihan Xiong, Maryam Fazel, Simon S. Du,
- Abstract summary: We consider three different types of feedback with decreasing revealed information.
We show the coverage assumption for the agent-level feedback setting is insufficient in the game-level feedback setting.
Our results constitute the first study of offline congestion games.
- Score: 53.83345471268163
- License:
- Abstract: This paper investigates when one can efficiently recover an approximate Nash Equilibrium (NE) in offline congestion games. The existing dataset coverage assumption in offline general-sum games inevitably incurs a dependency on the number of actions, which can be exponentially large in congestion games. We consider three different types of feedback with decreasing revealed information. Starting from the facility-level (a.k.a., semi-bandit) feedback, we propose a novel one-unit deviation coverage condition and give a pessimism-type algorithm that can recover an approximate NE. For the agent-level (a.k.a., bandit) feedback setting, interestingly, we show the one-unit deviation coverage condition is not sufficient. On the other hand, we convert the game to multi-agent linear bandits and show that with a generalized data coverage assumption in offline linear bandits, we can efficiently recover the approximate NE. Lastly, we consider a novel type of feedback, the game-level feedback where only the total reward from all agents is revealed. Again, we show the coverage assumption for the agent-level feedback setting is insufficient in the game-level feedback setting, and with a stronger version of the data coverage assumption for linear bandits, we can recover an approximate NE. Together, our results constitute the first study of offline congestion games and imply formal separations between different types of feedback.
Related papers
- Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM)
We study a model within this problem domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary.
We propose an algorithm namely robust contextual dueling bandit (algo), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z) - Auto-Encoding Bayesian Inverse Games [36.06617326128679]
We consider the inverse game problem, in which some properties of the game are unknown a priori.
Existing maximum likelihood estimation approaches to solve inverse games provide only point estimates of unknown parameters.
We take a Bayesian perspective and construct posterior distributions of game parameters.
This structured VAE can be trained from an unlabeled dataset of observed interactions.
arXiv Detail & Related papers (2024-02-14T02:17:37Z) - Robustly Improving Bandit Algorithms with Confounded and Selection
Biased Offline Data: A Causal Approach [18.13887411913371]
This paper studies bandit problems where an agent has access to offline data that might be utilized to potentially improve the estimation of each arm's reward distribution.
We categorize the biases into confounding bias and selection bias based on the causal structure they imply.
We extract the causal bound for each arm that is robust towards compound biases from biased observational data.
arXiv Detail & Related papers (2023-12-20T03:03:06Z) - Deep Graph Neural Networks via Posteriori-Sampling-based Node-Adaptive Residual Module [65.81781176362848]
Graph Neural Networks (GNNs) can learn from graph-structured data through neighborhood information aggregation.
As the number of layers increases, node representations become indistinguishable, which is known as over-smoothing.
We propose a textbfPosterior-Sampling-based, Node-distinguish Residual module (PSNR).
arXiv Detail & Related papers (2023-05-09T12:03:42Z) - Offline Learning in Markov Games with General Function Approximation [22.2472618685325]
We study offline multi-agent reinforcement learning (RL) in Markov games.
We provide the first framework for sample-efficient offline learning in Markov games.
arXiv Detail & Related papers (2023-02-06T05:22:27Z) - Reward Imputation with Sketching for Contextual Batched Bandits [48.80803376405073]
Contextual batched bandit (CBB) is a setting where a batch of rewards is observed from the environment at the end of each episode.
Existing approaches for CBB often ignore the rewards of the non-executed actions, leading to underutilization of feedback information.
We propose Sketched Policy Updating with Imputed Rewards (SPUIR) that completes the unobserved rewards using sketching.
arXiv Detail & Related papers (2022-10-13T04:26:06Z) - Multi-Armed Bandit Problem with Temporally-Partitioned Rewards: When
Partial Feedback Counts [53.579515853222986]
We study a novel bandit setting, namely Multi-Armed Bandit with Temporally-Partitioned Rewards (TP-MAB)
This setting is a natural extension of delayed-feedback bandits to the case in which rewards may be dilated over a finite-time span after the pull.
We provide two algorithms to address TP-MAB problems, namely, TP-UCB-FR and TP-UCB-EW.
arXiv Detail & Related papers (2022-06-01T15:56:59Z) - No-regret learning for repeated non-cooperative games with lossy bandits [5.437709019435926]
We study the asynchronous online learning strategy of the players to adaptively adjust the next actions for minimizing the long-term regret loss.
The paper provides a novel no-regret learning algorithm, called Online Gradient Descent with lossy bandits (OGD-lb)
arXiv Detail & Related papers (2022-05-14T05:02:56Z) - Experts with Lower-Bounded Loss Feedback: A Unifying Framework [8.947188600472256]
We prove optimal regret bounds for modified versions of Exp3, generalizing algorithms and bounds both for the bandit and the full-information settings.
Our results intersect with those for bandits with graph-structured feedback, in that both settings can accommodate feedback from an arbitrary subset of experts on each round.
Our model also accommodates partial feedback at the single-expert level, by allowing non-trivial lower bounds on each loss.
arXiv Detail & Related papers (2020-12-17T12:18:52Z) - From Poincar\'e Recurrence to Convergence in Imperfect Information
Games: Finding Equilibrium via Regularization [49.368421783733815]
We show how adapting the reward can give strong convergence guarantees in monotone games.
We also show how this reward adaptation technique can be leveraged to build algorithms that converge exactly to the Nash equilibrium.
arXiv Detail & Related papers (2020-02-19T21:36:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.