RL-CFR: Improving Action Abstraction for Imperfect Information
Extensive-Form Games with Reinforcement Learning
- URL: http://arxiv.org/abs/2403.04344v1
- Date: Thu, 7 Mar 2024 09:12:23 GMT
- Title: RL-CFR: Improving Action Abstraction for Imperfect Information
Extensive-Form Games with Reinforcement Learning
- Authors: Boning Li, Zhixuan Fang and Longbo Huang
- Abstract summary: We introduce RL-CFR, a novel reinforcement learning (RL) approach for dynamic action abstraction.
RL-CFR builds upon our innovative Markov Decision Process (MDP) formulation, with states corresponding to public information and actions represented as feature vectors indicating specific action abstractions.
In experiments on Heads-up No-limit Texas Hold'em, RL-CFR outperforms ReBeL's replication and Slumbot, demonstrating significant win-rate margins of $64pm 11$ and $84pm 17$ mbb/hand, respectively.
- Score: 42.80561441946148
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Effective action abstraction is crucial in tackling challenges associated
with large action spaces in Imperfect Information Extensive-Form Games
(IIEFGs). However, due to the vast state space and computational complexity in
IIEFGs, existing methods often rely on fixed abstractions, resulting in
sub-optimal performance. In response, we introduce RL-CFR, a novel
reinforcement learning (RL) approach for dynamic action abstraction. RL-CFR
builds upon our innovative Markov Decision Process (MDP) formulation, with
states corresponding to public information and actions represented as feature
vectors indicating specific action abstractions. The reward is defined as the
expected payoff difference between the selected and default action
abstractions. RL-CFR constructs a game tree with RL-guided action abstractions
and utilizes counterfactual regret minimization (CFR) for strategy derivation.
Impressively, it can be trained from scratch, achieving higher expected payoff
without increased CFR solving time. In experiments on Heads-up No-limit Texas
Hold'em, RL-CFR outperforms ReBeL's replication and Slumbot, demonstrating
significant win-rate margins of $64\pm 11$ and $84\pm 17$ mbb/hand,
respectively.
Related papers
- GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training [62.536191233049614]
Reinforcement learning with verifiable outcome rewards (RLVR) has effectively scaled up chain-of-thought (CoT) reasoning in large language models (LLMs)
This work investigates this problem through extensive experiments on complex card games, such as 24 points, and embodied tasks from ALFWorld.
We find that when rewards are based solely on action outcomes, RL fails to incentivize CoT reasoning in VLMs, instead leading to a phenomenon we termed thought collapse.
arXiv Detail & Related papers (2025-03-11T15:17:02Z) - Beyond Human Preferences: Exploring Reinforcement Learning Trajectory Evaluation and Improvement through LLMs [12.572869123617783]
Reinforcement learning (RL) faces challenges in evaluating policy trajectories within intricate game tasks.
PbRL presents a pioneering framework that capitalizes on human preferences as pivotal reward signals.
We propose a LLM-enabled automatic preference generation framework named LLM4PG.
arXiv Detail & Related papers (2024-06-28T04:21:24Z) - EventRL: Enhancing Event Extraction with Outcome Supervision for Large
Language Models [48.136950450053476]
EventRL is a reinforcement learning approach developed to enhance event extraction for large language models (LLMs)
We evaluate EventRL against existing methods like Few-Shot Prompting (FSP) and Supervised Fine-Tuning (SFT)
Our findings show that EventRL significantly outperforms these conventional approaches by improving the performance in identifying and structuring events.
arXiv Detail & Related papers (2024-02-18T02:41:06Z) - Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint [104.53687944498155]
Reinforcement learning (RL) has been widely used in training large language models (LLMs)
We propose a new RL method named RLMEC that incorporates a generative model as the reward model.
Based on the generative reward model, we design the token-level RL objective for training and an imitation-based regularization for stabilizing RL process.
arXiv Detail & Related papers (2024-01-11T17:58:41Z) - CODEX: A Cluster-Based Method for Explainable Reinforcement Learning [0.0]
We present a method that incorporates semantic clustering, which can effectively summarize RL agent behavior in the state-action space.
Experiments on the MiniGrid and StarCraft II gaming environments reveal the semantic clusters retain temporal as well as entity information.
arXiv Detail & Related papers (2023-12-07T11:04:37Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Data-Driven Evaluation of Training Action Space for Reinforcement
Learning [1.370633147306388]
This paper proposes a Shapley-inspired methodology for training action space categorization and ranking.
To reduce exponential-time shapley computations, the methodology includes a Monte Carlo simulation.
The proposed data-driven methodology is RL to different domains, use cases, and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-08T04:53:43Z) - A Simple Reward-free Approach to Constrained Reinforcement Learning [33.813302183231556]
This paper bridges reward-free RL and constrained RL. Particularly, we propose a simple meta-algorithm such that given any reward-free RL oracle, the approachability and constrained RL problems can be directly solved with negligible overheads in sample complexity.
arXiv Detail & Related papers (2021-07-12T06:27:30Z) - Residual Reinforcement Learning from Demonstrations [51.56457466788513]
Residual reinforcement learning (RL) has been proposed as a way to solve challenging robotic tasks by adapting control actions from a conventional feedback controller to maximize a reward signal.
We extend the residual formulation to learn from visual inputs and sparse rewards using demonstrations.
Our experimental evaluation on simulated manipulation tasks on a 6-DoF UR5 arm and a 28-DoF dexterous hand demonstrates that residual RL from demonstrations is able to generalize to unseen environment conditions more flexibly than either behavioral cloning or RL fine-tuning.
arXiv Detail & Related papers (2021-06-15T11:16:49Z) - RLCFR: Minimize Counterfactual Regret by Deep Reinforcement Learning [15.126468724917288]
We propose a framework, RLCFR, which aims at improving the generalization ability of the CFR method.
In the RLCFR, the game strategy is solved by the CFR in a reinforcement learning framework.
Our method, RLCFR, then learns a policy to select the appropriate way of regret updating in the process of iteration.
arXiv Detail & Related papers (2020-09-10T14:20:33Z) - Learning Abstract Models for Strategic Exploration and Fast Reward
Transfer [85.19766065886422]
We learn an accurate Markov Decision Process (MDP) over abstract states to avoid compounding errors.
Our approach achieves strong results on three of the hardest Arcade Learning Environment games.
We can reuse the learned abstract MDP for new reward functions, achieving higher reward in 1000x fewer samples than model-free methods trained from scratch.
arXiv Detail & Related papers (2020-07-12T03:33:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.