Stochastic Regret Minimization in Extensive-Form Games
- URL: http://arxiv.org/abs/2002.08493v1
- Date: Wed, 19 Feb 2020 23:05:41 GMT
- Title: Stochastic Regret Minimization in Extensive-Form Games
- Authors: Gabriele Farina, Christian Kroer, and Tuomas Sandholm
- Abstract summary: Monte-Carlo counterfactual regret minimization (MCCFR) is the state-of-the-art algorithm for solving sequential games that are too large for full trees.
We develop a new framework for developing regret minimization methods.
We show extensive experiments on three games, where some variants of our methods outperform MCCFR.
- Score: 109.43344748069933
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monte-Carlo counterfactual regret minimization (MCCFR) is the
state-of-the-art algorithm for solving sequential games that are too large for
full tree traversals. It works by using gradient estimates that can be computed
via sampling. However, stochastic methods for sequential games have not been
investigated extensively beyond MCCFR. In this paper we develop a new framework
for developing stochastic regret minimization methods. This framework allows us
to use any regret-minimization algorithm, coupled with any gradient estimator.
The MCCFR algorithm can be analyzed as a special case of our framework, and
this analysis leads to significantly-stronger theoretical on convergence, while
simultaneously yielding a simplified proof. Our framework allows us to
instantiate several new stochastic methods for solving sequential games. We
show extensive experiments on three games, where some variants of our methods
outperform MCCFR.
Related papers
- Efficient Last-iterate Convergence Algorithms in Solving Games [20.00785679098103]
No-regret algorithms are popular for learning Nash equilibrium in two-player zero-sum normal-form games (NFGs) and extensive-form games (EFGs)
Recent works propose a Reward Transformation (RT) framework for MWU, which removes the uniqueness condition and achieves competitive performance with OMWU.
We show that the bottleneck of RT-based algorithms is the speed of solving convex-concave optimization problems (SCCPs)
arXiv Detail & Related papers (2023-08-22T07:59:49Z) - Faster One-Sample Stochastic Conditional Gradient Method for Composite
Convex Minimization [61.26619639722804]
We propose a conditional gradient method (CGM) for minimizing convex finite-sum objectives formed as a sum of smooth and non-smooth terms.
The proposed method, equipped with an average gradient (SAG) estimator, requires only one sample per iteration. Nevertheless, it guarantees fast convergence rates on par with more sophisticated variance reduction techniques.
arXiv Detail & Related papers (2022-02-26T19:10:48Z) - No-Regret Dynamics in the Fenchel Game: A Unified Framework for
Algorithmic Convex Optimization [20.718016474717196]
We develop an algorithmic framework for solving convex optimization problems using no-regret game dynamics.
A common choice for these strategies are so-called no-regret learning algorithms.
We show that many classical first-order methods for convex optimization can be interpreted as special cases of our framework.
arXiv Detail & Related papers (2021-11-22T16:10:18Z) - Last-iterate Convergence in Extensive-Form Games [49.31256241275577]
We study last-iterate convergence of optimistic algorithms in sequential games.
We show that all of these algorithms enjoy last-iterate convergence, with some of them even converging exponentially fast.
arXiv Detail & Related papers (2021-06-27T22:02:26Z) - Minimax Optimization with Smooth Algorithmic Adversaries [59.47122537182611]
We propose a new algorithm for the min-player against smooth algorithms deployed by an adversary.
Our algorithm is guaranteed to make monotonic progress having no limit cycles, and to find an appropriate number of gradient ascents.
arXiv Detail & Related papers (2021-06-02T22:03:36Z) - Faster Game Solving via Predictive Blackwell Approachability: Connecting
Regret Matching and Mirror Descent [119.5481797273995]
Follow-the-regularized-leader (FTRL) and online mirror descent (OMD) are the most prevalent regret minimizers in online convex optimization.
We show that RM and RM+ are the algorithms that result from running FTRL and OMD, respectively, to select the halfspace to force at all times in the underlying Blackwell approachability game.
In experiments across 18 common zero-sum extensive-form benchmark games, we show that predictive RM+ coupled with counterfactual regret minimization converges vastly faster than the fastest prior algorithms.
arXiv Detail & Related papers (2020-07-28T16:49:55Z) - Reparameterized Variational Divergence Minimization for Stable Imitation [57.06909373038396]
We study the extent to which variations in the choice of probabilistic divergence may yield more performant ILO algorithms.
We contribute a re parameterization trick for adversarial imitation learning to alleviate the challenges of the promising $f$-divergence minimization framework.
Empirically, we demonstrate that our design choices allow for ILO algorithms that outperform baseline approaches and more closely match expert performance in low-dimensional continuous-control tasks.
arXiv Detail & Related papers (2020-06-18T19:04:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.