Asynchronous Predictive Counterfactual Regret Minimization$^+$ Algorithm in Solving Extensive-Form Games
- URL: http://arxiv.org/abs/2503.12770v1
- Date: Mon, 17 Mar 2025 03:07:38 GMT
- Title: Asynchronous Predictive Counterfactual Regret Minimization$^+$ Algorithm in Solving Extensive-Form Games
- Authors: Linjian Meng, Youzhi Zhang, Zhenxing Ge, Tianpei Yang, Yang Gao,
- Abstract summary: Predictive CFR$+$ (PCFR$+$) is particularly powerful, achieving an exceptionally fast empirical convergence rate.<n>We propose a novel variant, Asynchronous PCFR$+$ (APCFR$+$), which employs an adaptive asynchronization of step-sizes.<n>We present a theoretical analysis demonstrating why APCFR$+$ can enhance the robustness.
- Score: 16.13246414155694
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Counterfactual Regret Minimization (CFR) algorithms are widely used to compute a Nash equilibrium (NE) in two-player zero-sum imperfect-information extensive-form games (IIGs). Among them, Predictive CFR$^+$ (PCFR$^+$) is particularly powerful, achieving an exceptionally fast empirical convergence rate via the prediction in many games. However, the empirical convergence rate of PCFR$^+$ would significantly degrade if the prediction is inaccurate, leading to unstable performance on certain IIGs. To enhance the robustness of PCFR$^+$, we propose a novel variant, Asynchronous PCFR$^+$ (APCFR$^+$), which employs an adaptive asynchronization of step-sizes between the updates of implicit and explicit accumulated counterfactual regrets to mitigate the impact of the prediction inaccuracy on convergence. We present a theoretical analysis demonstrating why APCFR$^+$ can enhance the robustness. Finally, we propose a simplified version of APCFR$^+$ called Simple APCFR$^+$ (SAPCFR$^+$), which uses a fixed asynchronization of step-sizes to simplify the implementation that only needs a single-line modification of the original PCFR+. Interestingly, SAPCFR$^+$ achieves a constant-factor lower theoretical regret bound than PCFR$^+$ in the worst case. Experimental results demonstrate that (i) both APCFR$^+$ and SAPCFR$^+$ outperform PCFR$^+$ in most of the tested games, as well as (ii) SAPCFR$^+$ achieves a comparable empirical convergence rate with APCFR$^+$.
Related papers
- Hidden Convexity of Fair PCA and Fast Solver via Eigenvalue Optimization [1.4999444543328293]
Principal Component Analysis (PCA) is a technique in machine learning for dimensionality reduction of high-dimensional datasets.
The Fair (FPCA) model was introduced by Samadi et al. to equalize the reconstruction loss between subgroups.
The semidefinite relaxation (SDR) based approach proposed by Samadi et al. is computationally expensive even for suboptimal solutions.
In this paper, we identify a hidden convexity in the FPCA model and introduce an algorithm for convex optimization via eigenvalue optimization.
arXiv Detail & Related papers (2025-03-01T02:13:20Z) - Minimizing Weighted Counterfactual Regret with Optimistic Online Mirror Descent [44.080852682765276]
This work explores minimizing weighted counterfactual regret with optimistic Online Mirror Descent (OMD)
It integrates PCFR+ and Discounted CFR (DCFR) in a principled manner, swiftly mitigating negative effects of dominated actions.
Experimental results demonstrate PDCFR+'s fast convergence in common imperfect-information games.
arXiv Detail & Related papers (2024-04-22T05:37:22Z) - Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback [75.29048190099523]
Online gradient descent (OGD) is well known to be doubly optimal under strong convexity or monotonicity assumptions.
In this paper, we design a fully adaptive OGD algorithm, textsfAdaOGD, that does not require a priori knowledge of these parameters.
arXiv Detail & Related papers (2023-10-21T18:38:13Z) - Accelerating Nash Equilibrium Convergence in Monte Carlo Settings Through Counterfactual Value Based Fictitious Play [0.0]
We introduce a new MC-based algorithm for solving imperfect information games, called MCCFVFP.
MCCFVFP combines CFR's counterfactual value calculations with fictitious play's best response strategy.
Results show that MCCFVFP achieved convergence speeds approximately 20%$sim$50% faster than the most advanced MCCFR variants.
arXiv Detail & Related papers (2023-09-04T09:16:49Z) - Efficient Last-iterate Convergence Algorithms in Solving Games [29.25745562794961]
Recent studies reformulate learning an NE of the original EFG as learning the NEs of a sequence of (perturbed) regularized EFGs.
In this paper, we prove that CFR$+$, a classical parameter-free RM-based CFR algorithm, achieves last-iterate convergence in learning an NE of perturbed regularized EFGs.
arXiv Detail & Related papers (2023-08-22T07:59:49Z) - Asymptotically Unbiased Instance-wise Regularized Partial AUC
Optimization: Theory and Algorithm [101.44676036551537]
One-way Partial AUC (OPAUC) and Two-way Partial AUC (TPAUC) measures the average performance of a binary classifier.
Most of the existing methods could only optimize PAUC approximately, leading to inevitable biases that are not controllable.
We present a simpler reformulation of the PAUC problem via distributional robust optimization AUC.
arXiv Detail & Related papers (2022-10-08T08:26:22Z) - The Power of Regularization in Solving Extensive-Form Games [22.557806157585834]
We propose a series of new algorithms based on regularizing the payoff functions of the game.
In particular, we first show that dilated optimistic mirror descent (DOMD) can achieve a fast $tilde O(T)$ last-iterate convergence.
We also show that Reg-CFR, with a variant of optimistic mirror descent algorithm as regret-minimizer, can achieve $O(T1/4)$ best-iterate, and $O(T3/4)$ average-iterate convergence rate.
arXiv Detail & Related papers (2022-06-19T22:10:38Z) - Equivalence Analysis between Counterfactual Regret Minimization and
Online Mirror Descent [67.60077332154853]
Counterfactual Regret Minimization (CFR) is a regret minimization algorithm that minimizes the total regret by minimizing the local counterfactual regrets.
Follow-the-Regularized-Lead (FTRL) and Online Mirror Descent (OMD) algorithms are regret minimization algorithms in Online Convex Optimization.
We provide a new way to analyze and extend CFRs, by proving that CFR with Regret Matching and CFR with Regret Matching+ are special forms of FTRL and OMD.
arXiv Detail & Related papers (2021-10-11T02:12:25Z) - LSDAT: Low-Rank and Sparse Decomposition for Decision-based Adversarial
Attack [74.5144793386864]
LSDAT crafts perturbations in the low-dimensional subspace formed by the sparse component of the input sample and that of an adversarial sample.
LSD works directly in the image pixel domain to guarantee that non-$ell$ constraints, such as sparsity, are satisfied.
arXiv Detail & Related papers (2021-03-19T13:10:47Z) - Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality [131.45028999325797]
We develop a doubly robust off-policy AC (DR-Off-PAC) for discounted MDP.
DR-Off-PAC adopts a single timescale structure, in which both actor and critics are updated simultaneously with constant stepsize.
We study the finite-time convergence rate and characterize the sample complexity for DR-Off-PAC to attain an $epsilon$-accurate optimal policy.
arXiv Detail & Related papers (2021-02-23T18:56:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.