Related papers: Model-Free Online Learning in Unknown Sequential Decision Making Problems and Games

Model-Free Online Learning in Unknown Sequential Decision Making Problems and Games

URL: http://arxiv.org/abs/2103.04539v1
Date: Mon, 8 Mar 2021 04:03:24 GMT
Title: Model-Free Online Learning in Unknown Sequential Decision Making Problems and Games
Authors: Gabriele Farina, Tuomas Sandholm
Abstract summary: In large two-player zero-sum imperfect-information games, modern extensions of counterfactual regret minimization (CFR) are currently the practical state of the art for computing a Nash equilibrium. We formalize an online learning setting in which the strategy space is not known to the agent. We give an efficient algorithm that achieves $O(T3/4)$ regret with high probability for that setting, even when the agent faces an adversarial environment.
Score: 114.90723492840499
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Regret minimization has proved to be a versatile tool for tree-form sequential decision making and extensive-form games. In large two-player zero-sum imperfect-information games, modern extensions of counterfactual regret minimization (CFR) are currently the practical state of the art for computing a Nash equilibrium. Most regret-minimization algorithms for tree-form sequential decision making, including CFR, require (i) an exact model of the player's decision nodes, observation nodes, and how they are linked, and (ii) full knowledge, at all times t, about the payoffs -- even in parts of the decision space that are not encountered at time t. Recently, there has been growing interest towards relaxing some of those restrictions and making regret minimization applicable to settings for which reinforcement learning methods have traditionally been used -- for example, those in which only black-box access to the environment is available. We give the first, to our knowledge, regret-minimization algorithm that guarantees sublinear regret with high probability even when requirement (i) -- and thus also (ii) -- is dropped. We formalize an online learning setting in which the strategy space is not known to the agent and gets revealed incrementally whenever the agent encounters new decision points. We give an efficient algorithm that achieves $O(T^{3/4})$ regret with high probability for that setting, even when the agent faces an adversarial environment. Our experiments show it significantly outperforms the prior algorithms for the problem, which do not have such guarantees. It can be used in any application for which regret minimization is useful: approximating Nash equilibrium or quantal response equilibrium, approximating coarse correlated equilibrium in multi-player games, learning a best response, learning safe opponent exploitation, and online play against an unknown opponent/environment.

Related papers

Meta-Learning in Self-Play Regret Minimization [10.843705580746397]
We present a general approach to online optimization which plays a crucial role in many algorithms for approximating Nash equilibria in two-player zero-sum games. We build upon this, extending the framework to the more challenging self-play setting, which is the basis for most state-of-the-art equilibrium approximation algorithms. Our meta-learned algorithms considerably outperform other state-of-the-art regret minimization algorithms.
arXiv Detail & Related papers (2025-04-26T13:27:24Z)
Barriers to Welfare Maximization with No-Regret Learning [68.66209476382213]
We prove lower bounds for computing a near-optimal $T$-sparse CCE. In particular, we show that the inapproximability of maximum clique precludes attaining any non-trivial sparsity in time.
arXiv Detail & Related papers (2024-11-04T00:34:56Z)
Hardness of Independent Learning and Sparse Equilibrium Computation in Markov Games [70.19141208203227]
We consider the problem of decentralized multi-agent reinforcement learning in Markov games. We show that no algorithm attains no-regret in general-sum games when executed independently by all players. We show that our lower bounds hold even for seemingly easier setting in which all agents are controlled by a centralized algorithm.
arXiv Detail & Related papers (2023-03-22T03:28:12Z)
Oracle-Efficient Smoothed Online Learning for Piecewise Continuous Decision Making [73.48977854003697]
This work introduces a new notion of complexity, the generalized bracketing numbers, which marries constraints on the adversary to the size of the space. We then instantiate our bounds in several problems of interest, including online prediction and planning of piecewise continuous functions.
arXiv Detail & Related papers (2023-02-10T18:45:52Z)
Decentralized model-free reinforcement learning in stochastic games with average-reward objective [1.9852463786440127]
We show that our algorithm achieves both sublinear high probability regret of order $T3/4$ and sublinear expected regret of order $T2/3$. Our algorithm enjoys a low computational complexity and low memory space requirement compared to the previous works.
arXiv Detail & Related papers (2023-01-13T15:59:53Z)
Regret Minimization and Convergence to Equilibria in General-sum Markov Games [57.568118148036376]
We present the first algorithm for learning in general-sum Markov games that provides sublinear regret guarantees when executed by all agents. Our algorithm is decentralized, computationally efficient, and does not require any communication between agents.
arXiv Detail & Related papers (2022-07-28T16:27:59Z)
ESCHER: Eschewing Importance Sampling in Games by Computing a History Value Function to Estimate Regret [97.73233271730616]
Recent techniques for approximating Nash equilibria in very large games leverage neural networks to learn approximately optimal policies (strategies) DREAM, the only current CFR-based neural method that is model free and therefore scalable to very large games, trains a neural network on an estimated regret target that can have extremely high variance due to an importance sampling term inherited from Monte Carlo CFR (MCCFR) We show that a deep learning version of ESCHER outperforms the prior state of the art -- DREAM and neural fictitious self play (NFSP) -- and the difference becomes dramatic as game size increases.
arXiv Detail & Related papers (2022-06-08T18:43:45Z)
Online Multiobjective Minimax Optimization and Applications [14.699969822572308]
We introduce a simple but general online learning framework, in which an adaptive adversary introduces a new game. The learner's goal is to play so as to minimize the maximum coordinate of the cumulative vector-valued loss. We give a simple algorithm that can compete with the setting in which the adversary must announce their action first. This includes no regret learning: we can recover optimal algorithms and bounds for minimizing external regret, internal regret, adaptive regret, multigroup regret, subsequence regret, and a notion of regret in the sleeping experts setting.
arXiv Detail & Related papers (2021-08-09T06:52:08Z)
Hindsight and Sequential Rationality of Correlated Play [18.176128899338433]
We look at algorithms that ensure strong performance in hindsight relative to what could have been achieved with modified behavior. We develop and advocate for this hindsight framing of learning in general sequential decision-making settings. We present examples illustrating the distinct strengths and weaknesses of each type of equilibrium in the literature.
arXiv Detail & Related papers (2020-12-10T18:30:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.