Efficient Use of heuristics for accelerating XCS-based Policy Learning
in Markov Games
- URL: http://arxiv.org/abs/2005.12553v1
- Date: Tue, 26 May 2020 07:47:27 GMT
- Title: Efficient Use of heuristics for accelerating XCS-based Policy Learning
in Markov Games
- Authors: Hao Chen, Chang Wang, Jian Huang, Jianxing Gong
- Abstract summary: In games, playing against non-stationary opponents with learning ability is still challenging for reinforcement learning agents.
This paper proposes efficient use of rough papers to speed up policy learning when playing against concurrent learners.
- Score: 9.038065438586065
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In Markov games, playing against non-stationary opponents with learning
ability is still challenging for reinforcement learning (RL) agents, because
the opponents can evolve their policies concurrently. This increases the
complexity of the learning task and slows down the learning speed of the RL
agents. This paper proposes efficient use of rough heuristics to speed up
policy learning when playing against concurrent learners. Specifically, we
propose an algorithm that can efficiently learn explainable and generalized
action selection rules by taking advantages of the representation of
quantitative heuristics and an opponent model with an eXtended classifier
system (XCS) in zero-sum Markov games. A neural network is used to model the
opponent from their behaviors and the corresponding policy is inferred for
action selection and rule evolution. In cases of multiple heuristic policies,
we introduce the concept of Pareto optimality for action selection. Besides,
taking advantages of the condition representation and matching mechanism of
XCS, the heuristic policies and the opponent model can provide guidance for
situations with similar feature representation. Furthermore, we introduce an
accuracy-based eligibility trace mechanism to speed up rule evolution, i.e.,
classifiers that can match the historical traces are reinforced according to
their accuracy. We demonstrate the advantages of the proposed algorithm over
several benchmark algorithms in a soccer and a thief-and-hunter scenarios.
Related papers
- A Minimaximalist Approach to Reinforcement Learning from Human Feedback [49.45285664482369]
We present Self-Play Preference Optimization (SPO), an algorithm for reinforcement learning from human feedback.
Our approach is minimalist in that it does not require training a reward model nor unstable adversarial training.
We demonstrate that on a suite of continuous control tasks, we are able to learn significantly more efficiently than reward-model based approaches.
arXiv Detail & Related papers (2024-01-08T17:55:02Z) - Learning Diverse Risk Preferences in Population-based Self-play [23.07952140353786]
Current self-play algorithms optimize the agent to maximize expected win-rates against its current or historical copies.
We introduce diversity from the perspective that agents could have diverse risk preferences in the face of uncertainty.
We show that our method achieves comparable or superior performance in competitive games.
arXiv Detail & Related papers (2023-05-19T06:56:02Z) - Finding mixed-strategy equilibria of continuous-action games without
gradients using randomized policy networks [83.28949556413717]
We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients.
We model players' strategies using artificial neural networks.
This paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.
arXiv Detail & Related papers (2022-11-29T05:16:41Z) - Provably Efficient Fictitious Play Policy Optimization for Zero-Sum
Markov Games with Structured Transitions [145.54544979467872]
We propose and analyze new fictitious play policy optimization algorithms for zero-sum Markov games with structured but unknown transitions.
We prove tight $widetildemathcalO(sqrtK)$ regret bounds after $K$ episodes in a two-agent competitive game scenario.
Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization.
arXiv Detail & Related papers (2022-07-25T18:29:16Z) - Learning Generative Deception Strategies in Combinatorial Masking Games [27.2744631811653]
One way deception can be employed is through obscuring, or masking, some of the information about how systems are configured.
We present a novel game-theoretic model of the resulting defender-attacker interaction, where the defender chooses a subset of attributes to mask, while the attacker responds by choosing an exploit to execute.
We present a novel highly scalable approach for approximately solving such games by representing the strategies of both players as neural networks.
arXiv Detail & Related papers (2021-09-23T20:42:44Z) - Provably Efficient Algorithms for Multi-Objective Competitive RL [54.22598924633369]
We study multi-objective reinforcement learning (RL) where an agent's reward is represented as a vector.
In settings where an agent competes against opponents, its performance is measured by the distance of its average return vector to a target set.
We develop statistically and computationally efficient algorithms to approach the associated target set.
arXiv Detail & Related papers (2021-02-05T14:26:00Z) - Efficient Competitive Self-Play Policy Optimization [20.023522000925094]
We propose a new algorithmic framework for competitive self-play reinforcement learning in two-player zero-sum games.
Our method trains several agents simultaneously, and intelligently takes each other as opponent based on simple adversarial rules.
We prove theoretically that our algorithm converges to an approximate equilibrium with high probability in convex-concave games.
arXiv Detail & Related papers (2020-09-13T21:01:38Z) - Learning to Play Sequential Games versus Unknown Opponents [93.8672371143881]
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action.
We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents.
Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
arXiv Detail & Related papers (2020-07-10T09:33:05Z) - Learning to Model Opponent Learning [11.61673411387596]
Multi-Agent Reinforcement Learning (MARL) considers settings in which a set of coexisting agents interact with one another and their environment.
This poses a great challenge for value function-based algorithms whose convergence usually relies on the assumption of a stationary environment.
We develop a novel approach to modelling an opponent's learning dynamics which we term Learning to Model Opponent Learning (LeMOL)
arXiv Detail & Related papers (2020-06-06T17:19:04Z) - Learning from Learners: Adapting Reinforcement Learning Agents to be
Competitive in a Card Game [71.24825724518847]
We present a study on how popular reinforcement learning algorithms can be adapted to learn and to play a real-world implementation of a competitive multiplayer card game.
We propose specific training and validation routines for the learning agents, in order to evaluate how the agents learn to be competitive and explain how they adapt to each others' playing style.
arXiv Detail & Related papers (2020-04-08T14:11:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.