Estimating $\alpha$-Rank by Maximizing Information Gain
- URL: http://arxiv.org/abs/2101.09178v1
- Date: Fri, 22 Jan 2021 15:46:35 GMT
- Title: Estimating $\alpha$-Rank by Maximizing Information Gain
- Authors: Tabish Rashid, Cheng Zhang, Kamil Ciosek
- Abstract summary: Game theory has been increasingly applied in settings where the game is not known outright, but has to be estimated by sampling.
In this paper, we focus on $alpha$-rank, a popular game-theoretic solution concept designed to perform well in such scenarios.
We show the benefits of using information gain as compared to the confidence interval criterion of ResponseGraphUCB.
- Score: 26.440923373794444
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Game theory has been increasingly applied in settings where the game is not
known outright, but has to be estimated by sampling. For example, meta-games
that arise in multi-agent evaluation can only be accessed by running a
succession of expensive experiments that may involve simultaneous deployment of
several agents. In this paper, we focus on $\alpha$-rank, a popular
game-theoretic solution concept designed to perform well in such scenarios. We
aim to estimate the $\alpha$-rank of the game using as few samples as possible.
Our algorithm maximizes information gain between an epistemic belief over the
$\alpha$-ranks and the observed payoff. This approach has two main benefits.
First, it allows us to focus our sampling on the entries that matter the most
for identifying the $\alpha$-rank. Second, the Bayesian formulation provides a
facility to build in modeling assumptions by using a prior over game payoffs.
We show the benefits of using information gain as compared to the confidence
interval criterion of ResponseGraphUCB (Rowland et al. 2019), and provide
theoretical results justifying our method.
Related papers
- Among Us: A Sandbox for Measuring and Detecting Agentic Deception [1.1893676124374688]
We introduce $textitAmong Us$, a social deception game where language-based agents exhibit long-term, open-ended deception.<n>We find that models trained with RL are comparatively much better at producing deception than detecting it.<n>We also find two SAE features that work well at deception detection but are unable to steer the model to lie less.
arXiv Detail & Related papers (2025-04-05T06:09:32Z) - Mind the Gap: A Causal Perspective on Bias Amplification in Prediction & Decision-Making [58.06306331390586]
We introduce the notion of a margin complement, which measures how much a prediction score $S$ changes due to a thresholding operation.
We show that under suitable causal assumptions, the influences of $X$ on the prediction score $S$ are equal to the influences of $X$ on the true outcome $Y$.
arXiv Detail & Related papers (2024-05-24T11:22:19Z) - Policy Mirror Ascent for Efficient and Independent Learning in Mean
Field Games [35.86199604587823]
Mean-field games have been used as a theoretical tool to obtain an approximate Nash equilibrium for symmetric and anonymous $N$-player games.
We show that $N$ agents running policy mirror ascent converge to the Nash equilibrium of the regularized game within $widetildemathcalO(varepsilon-2)$ samples.
arXiv Detail & Related papers (2022-12-29T20:25:18Z) - Generalized Differentiable RANSAC [95.95627475224231]
$nabla$-RANSAC is a differentiable RANSAC that allows learning the entire randomized robust estimation pipeline.
$nabla$-RANSAC is superior to the state-of-the-art in terms of accuracy while running at a similar speed to its less accurate alternatives.
arXiv Detail & Related papers (2022-12-26T15:13:13Z) - Representation Learning for General-sum Low-rank Markov Games [63.119870889883224]
We study multi-agent general-sum Markov games with nonlinear function approximation.
We focus on low-rank Markov games whose transition matrix admits a hidden low-rank structure on top of an unknown non-linear representation.
arXiv Detail & Related papers (2022-10-30T22:58:22Z) - Minimax-Optimal Multi-Agent RL in Zero-Sum Markov Games With a
Generative Model [50.38446482252857]
Two-player zero-sum Markov games are arguably the most basic setting in multi-agent reinforcement learning.
We develop a learning algorithm that learns an $varepsilon$-approximate Markov NE policy using $$ widetildeObigg.
We derive a refined regret bound for FTRL that makes explicit the role of variance-type quantities.
arXiv Detail & Related papers (2022-08-22T17:24:55Z) - A Ranking Game for Imitation Learning [22.028680861819215]
We treat imitation as a two-player ranking-based Stackelberg game between a $textitpolicy$ and a $textitreward$ function.
This game encompasses a large subset of both inverse reinforcement learning (IRL) methods and methods which learn from offline preferences.
We theoretically analyze the requirements of the loss function used for ranking policy performances to facilitate near-optimal imitation learning at equilibrium.
arXiv Detail & Related papers (2022-02-07T19:38:22Z) - Top $K$ Ranking for Multi-Armed Bandit with Noisy Evaluations [102.32996053572144]
We consider a multi-armed bandit setting where, at the beginning of each round, the learner receives noisy independent evaluations of the true reward of each arm.
We derive different algorithmic approaches and theoretical guarantees depending on how the evaluations are generated.
arXiv Detail & Related papers (2021-12-13T09:48:54Z) - Linear Speedup in Personalized Collaborative Learning [69.45124829480106]
Personalization in federated learning can improve the accuracy of a model for a user by trading off the model's bias.
We formalize the personalized collaborative learning problem as optimization of a user's objective.
We explore conditions under which we can optimally trade-off their bias for a reduction in variance.
arXiv Detail & Related papers (2021-11-10T22:12:52Z) - Pure Exploration with Structured Preference Feedback [25.894827160719526]
We consider the problem of pure exploration with subset-wise preference feedback, which contains $N$ arms with features.
We present two algorithms that guarantee the detection of the best-arm in $tildeO (fracd2K Delta2)$ samples with probability at least $1.
arXiv Detail & Related papers (2021-04-12T08:57:29Z) - Markov Cricket: Using Forward and Inverse Reinforcement Learning to
Model, Predict And Optimize Batting Performance in One-Day International
Cricket [0.8122270502556374]
We model one-day international cricket games as Markov processes, applying forward and inverse Reinforcement Learning (RL) to develop three novel tools for the game.
We show that, when used as a proxy for remaining scoring resources, this approach outperforms the state-of-the-art Duckworth-Lewis-Stern method by 3 to 10 fold.
We envisage our prediction and simulation techniques may provide a fairer alternative for estimating final scores in interrupted games, while the inferred reward model may provide useful insights for the professional game to optimize playing strategy.
arXiv Detail & Related papers (2021-03-07T13:11:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.