Related papers: Learning to Play Against Unknown Opponents

Learning to Play Against Unknown Opponents

URL: http://arxiv.org/abs/2412.18297v2
Date: Thu, 20 Feb 2025 07:17:45 GMT
Title: Learning to Play Against Unknown Opponents
Authors: Eshwar Ram Arunachaleswaran, Natalie Collina, Jon Schneider,
Abstract summary: We show how to efficiently construct an optimal learning algorithm when the learning agent is not constrained to no-regret.<n>All these results make use of recently developed machinery that converts the analysis of learning algorithms to the class of corresponding geometric objects known as menus.
Score: 9.346742321348366
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We consider the problem of a learning agent who has to repeatedly play a general sum game against a strategic opponent who acts to maximize their own payoff by optimally responding against the learner's algorithm. The learning agent knows their own payoff function, but is uncertain about the payoff of their opponent (knowing only that it is drawn from some distribution $\mathcal{D}$). What learning algorithm should the agent run in order to maximize their own total utility, either in expectation or in the worst-case over $\mathcal{D}$? When the learning algorithm is constrained to be a no-regret algorithm, we demonstrate how to efficiently construct an optimal learning algorithm (asymptotically achieving the optimal utility) in polynomial time for both the in-expectation and worst-case problems, independent of any other assumptions. When the learning algorithm is not constrained to no-regret, we show how to construct an $\varepsilon$-optimal learning algorithm (obtaining average utility within $\varepsilon$ of the optimal utility) for both the in-expectation and worst-case problems in time polynomial in the size of the input and $1/\varepsilon$, when either the size of the game or the support of $\mathcal{D}$ is constant. Finally, for the special case of the maximin objective, where the learner wishes to maximize their minimum payoff over all possible optimizer types, we construct a learner algorithm that runs in polynomial time in each step and guarantees convergence to the optimal learner payoff. All of these results make use of recently developed machinery that converts the analysis of learning algorithms to the study of the class of corresponding geometric objects known as menus.

Related papers

Computational Intractability of Strategizing against Online Learners [7.225039292056237]
We show that a standard no-regret algorithm can compute a near-optimal strategy against a learner. This establishes a fundamental computational barrier to finding optimal strategies in general game-theoretic settings.
arXiv Detail & Related papers (2025-03-06T08:28:50Z)
Maximizing utility in multi-agent environments by anticipating the behavior of other learners [17.703508282875323]
In multi-agent settings, the decisions of each agent can affect the utilities/losses the other agents. In this paper, we study repeated two-player games involving two types of agents.
arXiv Detail & Related papers (2024-07-05T23:16:18Z)
A Competitive Algorithm for Agnostic Active Learning [5.4579367210379335]
Most popular algorithms for active learning express their performance in terms of a parameter called the disagreement coefficient. We get an algorithm that is competitive with the optimal algorithm for any binary hypothesis class $H$ and distribution $D_X$ over $X$. It is NP-hard to do better than our algorithm's $O(log |H|)$ overhead in general.
arXiv Detail & Related papers (2023-10-28T19:01:16Z)
Efficiently Learning One-Hidden-Layer ReLU Networks via Schur Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss. Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z)
Streaming Algorithms for Learning with Experts: Deterministic Versus Robust [62.98860182111096]
In the online learning with experts problem, an algorithm must make a prediction about an outcome on each of $T$ days (or times) The goal is to make a prediction with the minimum cost, specifically compared to the best expert in the set. We show a space lower bound of $widetildeOmegaleft(fracnMRTright)$ for any deterministic algorithm that achieves regret $R$ when the best expert makes $M$ mistakes.
arXiv Detail & Related papers (2023-03-03T04:39:53Z)
Strategizing against Learners in Bayesian Games [74.46970859427907]
We study repeated two-player games where one of the players, the learner, employs a no-regret learning strategy. We consider general Bayesian games, where the payoffs of both the payoffs of both the learner and the learner could depend on the type.
arXiv Detail & Related papers (2022-05-17T18:10:25Z)
Randomized Exploration for Reinforcement Learning with General Value Function Approximation [122.70803181751135]
We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm. Our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises. We complement the theory with an empirical evaluation across known difficult exploration tasks.
arXiv Detail & Related papers (2021-06-15T02:23:07Z)
Online Sub-Sampling for Reinforcement Learning with General Function Approximation [111.01990889581243]
In this paper, we establish an efficient online sub-sampling framework that measures the information gain of data points collected by an RL algorithm. For a value-based method with complexity-bounded function class, we show that the policy only needs to be updated for $proptooperatornamepolylog(K)$ times. In contrast to existing approaches that update the policy for at least $Omega(K)$ times, our approach drastically reduces the number of optimization calls in solving for a policy.
arXiv Detail & Related papers (2021-06-14T07:36:25Z)
Minimax Optimization with Smooth Algorithmic Adversaries [59.47122537182611]
We propose a new algorithm for the min-player against smooth algorithms deployed by an adversary. Our algorithm is guaranteed to make monotonic progress having no limit cycles, and to find an appropriate number of gradient ascents.
arXiv Detail & Related papers (2021-06-02T22:03:36Z)
Model Selection in Contextual Stochastic Bandit Problems [51.94632035240787]
We develop a meta-algorithm that selects between base algorithms. We show through a lower bound that even when one of the base algorithms has $O(sqrtT)$ regret, in general it is impossible to get better than $Omega(sqrtT)$ regret.
arXiv Detail & Related papers (2020-03-03T18:46:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.