Hierarchical Deep Counterfactual Regret Minimization
- URL: http://arxiv.org/abs/2305.17327v2
- Date: Tue, 26 Sep 2023 13:54:54 GMT
- Title: Hierarchical Deep Counterfactual Regret Minimization
- Authors: Jiayu Chen, Tian Lan, Vaneet Aggarwal
- Abstract summary: In this paper, we introduce the first hierarchical version of Deep CFR, an innovative method that boosts learning efficiency in tasks involving extensively large state spaces and deep game trees.
A notable advantage of HDCFR over previous works is its ability to facilitate learning with predefined (human) expertise and foster the acquisition of skills that can be transferred to similar tasks.
- Score: 53.86223883060367
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Imperfect Information Games (IIGs) offer robust models for scenarios where
decision-makers face uncertainty or lack complete information. Counterfactual
Regret Minimization (CFR) has been one of the most successful family of
algorithms for tackling IIGs. The integration of skill-based strategy learning
with CFR could potentially mirror more human-like decision-making process and
enhance the learning performance for complex IIGs. It enables the learning of a
hierarchical strategy, wherein low-level components represent skills for
solving subgames and the high-level component manages the transition between
skills. In this paper, we introduce the first hierarchical version of Deep CFR
(HDCFR), an innovative method that boosts learning efficiency in tasks
involving extensively large state spaces and deep game trees. A notable
advantage of HDCFR over previous works is its ability to facilitate learning
with predefined (human) expertise and foster the acquisition of skills that can
be transferred to similar tasks. To achieve this, we initially construct our
algorithm on a tabular setting, encompassing hierarchical CFR updating rules
and a variance-reduced Monte Carlo sampling extension. Notably, we offer the
theoretical justifications, including the convergence rate of the proposed
updating rule, the unbiasedness of the Monte Carlo regret estimator, and ideal
criteria for effective variance reduction. Then, we employ neural networks as
function approximators and develop deep learning objectives to adapt our
proposed algorithms for large-scale tasks, while maintaining the theoretical
support.
Related papers
- Super Level Sets and Exponential Decay: A Synergistic Approach to Stable Neural Network Training [0.0]
We develop a dynamic learning rate algorithm that integrates exponential decay and advanced anti-overfitting strategies.
We prove that the superlevel sets of the loss function, as influenced by our adaptive learning rate, are always connected.
arXiv Detail & Related papers (2024-09-25T09:27:17Z) - An Efficient Learning-based Solver Comparable to Metaheuristics for the
Capacitated Arc Routing Problem [67.92544792239086]
We introduce an NN-based solver to significantly narrow the gap with advanced metaheuristics.
First, we propose direction-aware facilitating attention model (DaAM) to incorporate directionality into the embedding process.
Second, we design a supervised reinforcement learning scheme that involves supervised pre-training to establish a robust initial policy.
arXiv Detail & Related papers (2024-03-11T02:17:42Z) - Language-guided Skill Learning with Temporal Variational Inference [38.733622157088035]
We present an algorithm for skill discovery from expert demonstrations.
Our results demonstrate that agents equipped with our method are able to discover skills that help accelerate learning.
arXiv Detail & Related papers (2024-02-26T07:19:23Z) - Stochastic Unrolled Federated Learning [85.6993263983062]
We introduce UnRolled Federated learning (SURF), a method that expands algorithm unrolling to federated learning.
Our proposed method tackles two challenges of this expansion, namely the need to feed whole datasets to the unrolleds and the decentralized nature of federated learning.
arXiv Detail & Related papers (2023-05-24T17:26:22Z) - Hierarchically Structured Task-Agnostic Continual Learning [0.0]
We take a task-agnostic view of continual learning and develop a hierarchical information-theoretic optimality principle.
We propose a neural network layer, called the Mixture-of-Variational-Experts layer, that alleviates forgetting by creating a set of information processing paths.
Our approach can operate in a task-agnostic way, i.e., it does not require task-specific knowledge, as is the case with many existing continual learning algorithms.
arXiv Detail & Related papers (2022-11-14T19:53:15Z) - Neuroevolution is a Competitive Alternative to Reinforcement Learning
for Skill Discovery [12.586875201983778]
Deep Reinforcement Learning (RL) has emerged as a powerful paradigm for training neural policies to solve complex control tasks.
We show that Quality Diversity (QD) methods are a competitive alternative to information-theory-augmented RL for skill discovery.
arXiv Detail & Related papers (2022-10-06T11:06:39Z) - Improved Algorithms for Neural Active Learning [74.89097665112621]
We improve the theoretical and empirical performance of neural-network(NN)-based active learning algorithms for the non-parametric streaming setting.
We introduce two regret metrics by minimizing the population loss that are more suitable in active learning than the one used in state-of-the-art (SOTA) related work.
arXiv Detail & Related papers (2022-10-02T05:03:38Z) - Hierarchical Skills for Efficient Exploration [70.62309286348057]
In reinforcement learning, pre-trained low-level skills have the potential to greatly facilitate exploration.
Prior knowledge of the downstream task is required to strike the right balance between generality (fine-grained control) and specificity (faster learning) in skill design.
We propose a hierarchical skill learning framework that acquires skills of varying complexity in an unsupervised manner.
arXiv Detail & Related papers (2021-10-20T22:29:32Z) - The Information Geometry of Unsupervised Reinforcement Learning [133.20816939521941]
Unsupervised skill discovery is a class of algorithms that learn a set of policies without access to a reward function.
We show that unsupervised skill discovery algorithms do not learn skills that are optimal for every possible reward function.
arXiv Detail & Related papers (2021-10-06T13:08:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.