Related papers: Efficient Use of heuristics for accelerating XCS-based Policy Learning in Markov Games

Efficient Use of heuristics for accelerating XCS-based Policy Learning in Markov Games

URL: http://arxiv.org/abs/2005.12553v1
Date: Tue, 26 May 2020 07:47:27 GMT
Title: Efficient Use of heuristics for accelerating XCS-based Policy Learning in Markov Games
Authors: Hao Chen, Chang Wang, Jian Huang, Jianxing Gong
Abstract summary: In games, playing against non-stationary opponents with learning ability is still challenging for reinforcement learning agents. This paper proposes efficient use of rough papers to speed up policy learning when playing against concurrent learners.
Score: 9.038065438586065
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In Markov games, playing against non-stationary opponents with learning ability is still challenging for reinforcement learning (RL) agents, because the opponents can evolve their policies concurrently. This increases the complexity of the learning task and slows down the learning speed of the RL agents. This paper proposes efficient use of rough heuristics to speed up policy learning when playing against concurrent learners. Specifically, we propose an algorithm that can efficiently learn explainable and generalized action selection rules by taking advantages of the representation of quantitative heuristics and an opponent model with an eXtended classifier system (XCS) in zero-sum Markov games. A neural network is used to model the opponent from their behaviors and the corresponding policy is inferred for action selection and rule evolution. In cases of multiple heuristic policies, we introduce the concept of Pareto optimality for action selection. Besides, taking advantages of the condition representation and matching mechanism of XCS, the heuristic policies and the opponent model can provide guidance for situations with similar feature representation. Furthermore, we introduce an accuracy-based eligibility trace mechanism to speed up rule evolution, i.e., classifiers that can match the historical traces are reinforced according to their accuracy. We demonstrate the advantages of the proposed algorithm over several benchmark algorithms in a soccer and a thief-and-hunter scenarios.

Related papers

Self-Evolving Curriculum for LLM Reasoning [108.23021254812258]
Self-Evolving Curriculum (SEC) is an automatic curriculum learning method that learns a curriculum policy concurrently with the RL fine-tuning process.<n>Our experiments demonstrate that SEC significantly improves models' reasoning capabilities, enabling better generalization to harder, out-of-distribution test problems.
arXiv Detail & Related papers (2025-05-20T23:17:15Z)
Extend Adversarial Policy Against Neural Machine Translation via Unknown Token [66.40609413186122]
We propose the DexChar policy' that introduces character perturbations for the existing mainstream adversarial policy based on token substitution. We also improve the self-supervised matching that provides feedback in RL to cater to the semantic constraints required during training adversaries.
arXiv Detail & Related papers (2025-01-21T14:43:04Z)
Preference-based opponent shaping in differentiable games [3.373994463906893]
We propose a novel Preference-based Opponent Shaping (PBOS) method to enhance the strategy learning process by shaping agents' preferences towards cooperation. We verify the performance of PBOS algorithm in a variety of differentiable games.
arXiv Detail & Related papers (2024-12-04T06:49:21Z)
A Minimaximalist Approach to Reinforcement Learning from Human Feedback [49.45285664482369]
We present Self-Play Preference Optimization (SPO), an algorithm for reinforcement learning from human feedback. Our approach is minimalist in that it does not require training a reward model nor unstable adversarial training. We demonstrate that on a suite of continuous control tasks, we are able to learn significantly more efficiently than reward-model based approaches.
arXiv Detail & Related papers (2024-01-08T17:55:02Z)
Learning Diverse Risk Preferences in Population-based Self-play [23.07952140353786]
Current self-play algorithms optimize the agent to maximize expected win-rates against its current or historical copies. We introduce diversity from the perspective that agents could have diverse risk preferences in the face of uncertainty. We show that our method achieves comparable or superior performance in competitive games.
arXiv Detail & Related papers (2023-05-19T06:56:02Z)
Finding mixed-strategy equilibria of continuous-action games without gradients using randomized policy networks [83.28949556413717]
We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients. We model players' strategies using artificial neural networks. This paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.
arXiv Detail & Related papers (2022-11-29T05:16:41Z)
Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions [145.54544979467872]
We propose and analyze new fictitious play policy optimization algorithms for zero-sum Markov games with structured but unknown transitions. We prove tight $widetildemathcalO(sqrtK)$ regret bounds after $K$ episodes in a two-agent competitive game scenario. Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization.
arXiv Detail & Related papers (2022-07-25T18:29:16Z)
Learning Generative Deception Strategies in Combinatorial Masking Games [27.2744631811653]
One way deception can be employed is through obscuring, or masking, some of the information about how systems are configured. We present a novel game-theoretic model of the resulting defender-attacker interaction, where the defender chooses a subset of attributes to mask, while the attacker responds by choosing an exploit to execute. We present a novel highly scalable approach for approximately solving such games by representing the strategies of both players as neural networks.
arXiv Detail & Related papers (2021-09-23T20:42:44Z)
Provably Efficient Algorithms for Multi-Objective Competitive RL [54.22598924633369]
We study multi-objective reinforcement learning (RL) where an agent's reward is represented as a vector. In settings where an agent competes against opponents, its performance is measured by the distance of its average return vector to a target set. We develop statistically and computationally efficient algorithms to approach the associated target set.
arXiv Detail & Related papers (2021-02-05T14:26:00Z)
Efficient Competitive Self-Play Policy Optimization [20.023522000925094]
We propose a new algorithmic framework for competitive self-play reinforcement learning in two-player zero-sum games. Our method trains several agents simultaneously, and intelligently takes each other as opponent based on simple adversarial rules. We prove theoretically that our algorithm converges to an approximate equilibrium with high probability in convex-concave games.
arXiv Detail & Related papers (2020-09-13T21:01:38Z)
Learning to Play Sequential Games versus Unknown Opponents [93.8672371143881]
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action. We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents. Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
arXiv Detail & Related papers (2020-07-10T09:33:05Z)
Learning to Model Opponent Learning [11.61673411387596]
Multi-Agent Reinforcement Learning (MARL) considers settings in which a set of coexisting agents interact with one another and their environment. This poses a great challenge for value function-based algorithms whose convergence usually relies on the assumption of a stationary environment. We develop a novel approach to modelling an opponent's learning dynamics which we term Learning to Model Opponent Learning (LeMOL)
arXiv Detail & Related papers (2020-06-06T17:19:04Z)
Learning from Learners: Adapting Reinforcement Learning Agents to be Competitive in a Card Game [71.24825724518847]
We present a study on how popular reinforcement learning algorithms can be adapted to learn and to play a real-world implementation of a competitive multiplayer card game. We propose specific training and validation routines for the learning agents, in order to evaluate how the agents learn to be competitive and explain how they adapt to each others' playing style.
arXiv Detail & Related papers (2020-04-08T14:11:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.