Minimax Strikes Back
- URL: http://arxiv.org/abs/2012.10700v1
- Date: Sat, 19 Dec 2020 14:42:41 GMT
- Title: Minimax Strikes Back
- Authors: Quentin Cohen-Solal and Tristan Cazenave
- Abstract summary: Deep Reinforcement Learning reaches a superhuman level of play in many complete information games.
We take another approach to DRL using a Minimax algorithm instead of MCTS and learning only the evaluation of states, not the policy.
We show that for multiple games it is competitive with the state of the art DRL for the learning performances and for the confrontations.
- Score: 10.485343576893865
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Reinforcement Learning (DRL) reaches a superhuman level of play in many
complete information games. The state of the art search algorithm used in
combination with DRL is Monte Carlo Tree Search (MCTS). We take another
approach to DRL using a Minimax algorithm instead of MCTS and learning only the
evaluation of states, not the policy. We show that for multiple games it is
competitive with the state of the art DRL for the learning performances and for
the confrontations.
Related papers
- Deep Reinforcement Learning for 5*5 Multiplayer Go [6.222520876209623]
We propose to use and analyze the latest algorithms that use search and Deep Reinforcement Learning (DRL)
We show that using search and DRL we were able to improve the level of play, even though there are more than two players.
arXiv Detail & Related papers (2024-05-23T07:44:24Z) - The Virtues of Pessimism in Inverse Reinforcement Learning [38.98656220917943]
Inverse Reinforcement Learning is a powerful framework for learning complex behaviors from expert demonstrations.
It is desirable to reduce the exploration burden by leveraging expert demonstrations in the inner-loop RL.
We consider an alternative approach to speeding up the RL in IRL: emphpessimism, i.e., staying close to the expert's data distribution, instantiated via the use of offline RL algorithms.
arXiv Detail & Related papers (2024-02-04T21:22:29Z) - Leveraging Reward Consistency for Interpretable Feature Discovery in
Reinforcement Learning [69.19840497497503]
It is argued that the commonly used action matching principle is more like an explanation of deep neural networks (DNNs) than the interpretation of RL agents.
We propose to consider rewards, the essential objective of RL agents, as the essential objective of interpreting RL agents.
We verify and evaluate our method on the Atari 2600 games as well as Duckietown, a challenging self-driving car simulator environment.
arXiv Detail & Related papers (2023-09-04T09:09:54Z) - Multi-Agent Path Finding via Tree LSTM [17.938710696964662]
In the 2021 Flatland3 Challenge, a competition on MAPF, the best RL method scored only 27.9, far less than the best OR method.
This paper proposes a new RL solution to Flatland3 Challenge, which scores 125.3, several times higher than the best RL solution before.
arXiv Detail & Related papers (2022-10-24T03:22:20Z) - Mastering the Game of No-Press Diplomacy via Human-Regularized
Reinforcement Learning and Planning [95.78031053296513]
No-press Diplomacy is a complex strategy game involving both cooperation and competition.
We introduce a planning algorithm we call DiL-piKL that regularizes a reward-maximizing policy toward a human imitation-learned policy.
We show that DiL-piKL can be extended into a self-play reinforcement learning algorithm we call RL-DiL-piKL.
arXiv Detail & Related papers (2022-10-11T14:47:35Z) - All You Need Is Supervised Learning: From Imitation Learning to Meta-RL
With Upside Down RL [0.5735035463793008]
Upside down reinforcement learning (UDRL) flips the conventional use of the return in the objective function in RL upside down.
UDRL is based purely on supervised learning, and bypasses some prominent issues in RL: bootstrapping, off-policy corrections, and discount factors.
arXiv Detail & Related papers (2022-02-24T08:44:11Z) - RL-DARTS: Differentiable Architecture Search for Reinforcement Learning [62.95469460505922]
We introduce RL-DARTS, one of the first applications of Differentiable Architecture Search (DARTS) in reinforcement learning (RL)
By replacing the image encoder with a DARTS supernet, our search method is sample-efficient, requires minimal extra compute resources, and is also compatible with off-policy and on-policy RL algorithms, needing only minor changes in preexisting code.
We show that the supernet gradually learns better cells, leading to alternative architectures which can be highly competitive against manually designed policies, but also verify previous design choices for RL policies.
arXiv Detail & Related papers (2021-06-04T03:08:43Z) - Gym-$\mu$RTS: Toward Affordable Full Game Real-time Strategy Games
Research with Deep Reinforcement Learning [0.0]
We introduce Gym-$mu$RTS as a fast-to-run RL environment for full-game RTS research.
We present a collection of techniques to scale DRL to play full-game $mu$RTS.
arXiv Detail & Related papers (2021-05-21T20:13:35Z) - Maximum Entropy RL (Provably) Solves Some Robust RL Problems [94.80212602202518]
We prove theoretically that standard maximum entropy RL is robust to some disturbances in the dynamics and the reward function.
Our results suggest that MaxEnt RL by itself is robust to certain disturbances, without requiring any additional modifications.
arXiv Detail & Related papers (2021-03-10T18:45:48Z) - RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning [108.9599280270704]
We propose a benchmark called RL Unplugged to evaluate and compare offline RL methods.
RL Unplugged includes data from a diverse range of domains including games and simulated motor control problems.
We will release data for all our tasks and open-source all algorithms presented in this paper.
arXiv Detail & Related papers (2020-06-24T17:14:51Z) - The NetHack Learning Environment [79.06395964379107]
We present the NetHack Learning Environment (NLE), a procedurally generated rogue-like environment for Reinforcement Learning research.
We argue that NetHack is sufficiently complex to drive long-term research on problems such as exploration, planning, skill acquisition, and language-conditioned RL.
We demonstrate empirical success for early stages of the game using a distributed Deep RL baseline and Random Network Distillation exploration.
arXiv Detail & Related papers (2020-06-24T14:12:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.