Agent57: Outperforming the Atari Human Benchmark
- URL: http://arxiv.org/abs/2003.13350v1
- Date: Mon, 30 Mar 2020 11:33:16 GMT
- Title: Agent57: Outperforming the Atari Human Benchmark
- Authors: Adri\`a Puigdom\`enech Badia, Bilal Piot, Steven Kapturowski, Pablo
Sprechmann, Alex Vitvitskyi, Daniel Guo, Charles Blundell
- Abstract summary: Atari games have been a long-standing benchmark in reinforcement learning.
We propose Agent57, the first deep RL agent that outperforms the standard human benchmark on all 57 Atari games.
- Score: 15.75730239983062
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Atari games have been a long-standing benchmark in the reinforcement learning
(RL) community for the past decade. This benchmark was proposed to test general
competency of RL algorithms. Previous work has achieved good average
performance by doing outstandingly well on many games of the set, but very
poorly in several of the most challenging games. We propose Agent57, the first
deep RL agent that outperforms the standard human benchmark on all 57 Atari
games. To achieve this result, we train a neural network which parameterizes a
family of policies ranging from very exploratory to purely exploitative. We
propose an adaptive mechanism to choose which policy to prioritize throughout
the training process. Additionally, we utilize a novel parameterization of the
architecture that allows for more consistent and stable learning.
Related papers
- Reinforcing Competitive Multi-Agents for Playing So Long Sucker [0.393259574660092]
This paper examines the use of classical deep reinforcement learning (DRL) algorithms, DQN, DDQN, and Dueling DQN, in the strategy game So Long Sucker.
The study's primary goal is to teach autonomous agents the game's rules and strategies using classical DRL methods.
arXiv Detail & Related papers (2024-11-17T12:38:13Z) - Scaling Offline Model-Based RL via Jointly-Optimized World-Action Model Pretraining [49.730897226510095]
We introduce JOWA: Jointly-Reinforced World-Action model, an offline model-based RL agent pretrained on Atari games with 6 billion tokens data.
Our largest agent, with 150 million parameters, 78.9% human-level performance on pretrained games using only 10% subsampled offline data, outperforming existing state-of-the-art large-scale offline RL baselines by 31.6% on averange.
arXiv Detail & Related papers (2024-10-01T10:25:03Z) - Minimax Exploiter: A Data Efficient Approach for Competitive Self-Play [12.754819077905061]
Minimax Exploiter is a game theoretic approach to exploiting Main Agents that leverages knowledge of its opponents.
We validate our approach in a diversity of settings, including simple turn based games, the arcade learning environment, and For Honor, a modern video game.
arXiv Detail & Related papers (2023-11-28T19:34:40Z) - Mastering the Game of No-Press Diplomacy via Human-Regularized
Reinforcement Learning and Planning [95.78031053296513]
No-press Diplomacy is a complex strategy game involving both cooperation and competition.
We introduce a planning algorithm we call DiL-piKL that regularizes a reward-maximizing policy toward a human imitation-learned policy.
We show that DiL-piKL can be extended into a self-play reinforcement learning algorithm we call RL-DiL-piKL.
arXiv Detail & Related papers (2022-10-11T14:47:35Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Human-level Atari 200x faster [21.329004162570016]
Agent57 was the first agent to surpass thehuman benchmark on all 57 games, but this came at the cost of poor data-efficiency.
We employ a diverse set ofstrategies to achieve a 200-fold reduction of experience needed to outperform the human baseline.
We also demonstrate competitiveperformance with high-performing methods such as Muesli and MuZero.
arXiv Detail & Related papers (2022-09-15T18:08:48Z) - A Review for Deep Reinforcement Learning in Atari:Benchmarks,
Challenges, and Solutions [0.0]
Arcade Learning Environment (ALE) is proposed as an evaluation platform for empirically assessing the generality of agents across Atari 2600 games.
From Deep Q-Networks (DQN) to Agent57, RL agents seem to achieve superhuman performance in ALE.
We propose a novel Atari benchmark based on human world records (HWR), which puts forward higher requirements for RL agents on both final performance and learning efficiency.
arXiv Detail & Related papers (2021-12-08T06:52:23Z) - Mastering Atari Games with Limited Data [73.6189496825209]
We propose a sample efficient model-based visual RL algorithm built on MuZero, which we name EfficientZero.
Our method achieves 190.4% mean human performance on the Atari 100k benchmark with only two hours of real-time game experience.
This is the first time an algorithm achieves super-human performance on Atari games with such little data.
arXiv Detail & Related papers (2021-10-30T09:13:39Z) - Provably Efficient Algorithms for Multi-Objective Competitive RL [54.22598924633369]
We study multi-objective reinforcement learning (RL) where an agent's reward is represented as a vector.
In settings where an agent competes against opponents, its performance is measured by the distance of its average return vector to a target set.
We develop statistically and computationally efficient algorithms to approach the associated target set.
arXiv Detail & Related papers (2021-02-05T14:26:00Z) - Never Give Up: Learning Directed Exploration Strategies [63.19616370038824]
We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies.
We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies.
A self-supervised inverse dynamics model is used to train the embeddings of the nearest neighbour lookup, biasing the novelty signal towards what the agent can control.
arXiv Detail & Related papers (2020-02-14T13:57:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.