Human-level Atari 200x faster
- URL: http://arxiv.org/abs/2209.07550v1
- Date: Thu, 15 Sep 2022 18:08:48 GMT
- Title: Human-level Atari 200x faster
- Authors: Steven Kapturowski, V\'ictor Campos, Ray Jiang, Nemanja Raki\'cevi\'c,
Hado van Hasselt, Charles Blundell, Adri\`a Puigdom\`enech Badia
- Abstract summary: Agent57 was the first agent to surpass thehuman benchmark on all 57 games, but this came at the cost of poor data-efficiency.
We employ a diverse set ofstrategies to achieve a 200-fold reduction of experience needed to outperform the human baseline.
We also demonstrate competitiveperformance with high-performing methods such as Muesli and MuZero.
- Score: 21.329004162570016
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The task of building general agents that perform well over a wide range of
tasks has been an importantgoal in reinforcement learning since its inception.
The problem has been subject of research of alarge body of work, with
performance frequently measured by observing scores over the wide rangeof
environments contained in the Atari 57 benchmark. Agent57 was the first agent
to surpass thehuman benchmark on all 57 games, but this came at the cost of
poor data-efficiency, requiring nearly 80billion frames of experience to
achieve. Taking Agent57 as a starting point, we employ a diverse set
ofstrategies to achieve a 200-fold reduction of experience needed to outperform
the human baseline. Weinvestigate a range of instabilities and bottlenecks we
encountered while reducing the data regime, andpropose effective solutions to
build a more robust and efficient agent. We also demonstrate
competitiveperformance with high-performing methods such as Muesli and MuZero.
The four key components toour approach are (1) an approximate trust region
method which enables stable bootstrapping from theonline network, (2) a
normalisation scheme for the loss and priorities which improves robustness
whenlearning a set of value functions with a wide range of scales, (3) an
improved architecture employingtechniques from NFNets in order to leverage
deeper networks without the need for normalization layers,and (4) a policy
distillation method which serves to smooth out the instantaneous greedy policy
overtime.
Related papers
- FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale Reinforcement Learning Fine-Tuning [74.25049012472502]
FLaRe is a large-scale Reinforcement Learning framework that integrates robust pre-trained representations, large-scale training, and gradient stabilization techniques.
Our method aligns pre-trained policies towards task completion, achieving state-of-the-art (SoTA) performance on previously demonstrated and on entirely novel tasks and embodiments.
arXiv Detail & Related papers (2024-09-25T03:15:17Z) - Layer Ensemble Averaging for Improving Memristor-Based Artificial Neural Network Performance [0.6560901506023631]
In-memory computation architectures, like memristors, offer promise but face challenges due to hardware non-idealities.
Layer ensemble averaging is a technique to map pre-trained neural network solutions from software to defective hardware crossbars.
Results show that layer ensemble averaging can reliably boost defective memristive network performance up to the software baseline.
arXiv Detail & Related papers (2024-04-24T03:19:31Z) - Enhancing Infrared Small Target Detection Robustness with Bi-Level
Adversarial Framework [61.34862133870934]
We propose a bi-level adversarial framework to promote the robustness of detection in the presence of distinct corruptions.
Our scheme remarkably improves 21.96% IOU across a wide array of corruptions and notably promotes 4.97% IOU on the general benchmark.
arXiv Detail & Related papers (2023-09-03T06:35:07Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Building Robust Ensembles via Margin Boosting [98.56381714748096]
In adversarial robustness, a single model does not usually have enough power to defend against all possible adversarial attacks.
We develop an algorithm for learning an ensemble with maximum margin.
We show that our algorithm not only outperforms existing ensembling techniques, but also large models trained in an end-to-end fashion.
arXiv Detail & Related papers (2022-06-07T14:55:58Z) - Generalized Data Distribution Iteration [0.0]
We tackle data richness and exploration-exploitation trade-off simultaneously in deep reinforcement learning.
We introduce operator-based versions of well-known RL methods from DQN to Agent57.
Our algorithm has achieved 9620.33% mean human normalized score (HNS), 1146.39% median HNS and surpassed 22 human world records using only 200M training frames.
arXiv Detail & Related papers (2022-06-07T11:27:40Z) - Robust Reinforcement Learning via Genetic Curriculum [5.421464476555662]
Genetic curriculum is an algorithm that automatically identifies scenarios in which the agent currently fails and generates an associated curriculum.
Our empirical studies show improvement in robustness over the existing state of the art algorithms, providing training curricula that result in agents being 2 - 8x times less likely to fail.
arXiv Detail & Related papers (2022-02-17T01:14:20Z) - Unsupervised Domain-adaptive Hash for Networks [81.49184987430333]
Domain-adaptive hash learning has enjoyed considerable success in the computer vision community.
We develop an unsupervised domain-adaptive hash learning method for networks, dubbed UDAH.
arXiv Detail & Related papers (2021-08-20T12:09:38Z) - Age of Information Aware VNF Scheduling in Industrial IoT Using Deep
Reinforcement Learning [9.780232937571599]
Deep reinforcement learning (DRL) has appeared as a viable way to solve such problems.
In this paper, we first utilize single agent low-complex compound action actor-critic RL to cover both discrete and continuous actions.
We then extend our solution to a multi-agent DRL scheme in which agents collaborate with each other.
arXiv Detail & Related papers (2021-05-10T09:04:49Z) - Agent57: Outperforming the Atari Human Benchmark [15.75730239983062]
Atari games have been a long-standing benchmark in reinforcement learning.
We propose Agent57, the first deep RL agent that outperforms the standard human benchmark on all 57 Atari games.
arXiv Detail & Related papers (2020-03-30T11:33:16Z) - Never Give Up: Learning Directed Exploration Strategies [63.19616370038824]
We propose a reinforcement learning agent to solve hard exploration games by learning a range of directed exploratory policies.
We construct an episodic memory-based intrinsic reward using k-nearest neighbors over the agent's recent experience to train the directed exploratory policies.
A self-supervised inverse dynamics model is used to train the embeddings of the nearest neighbour lookup, biasing the novelty signal towards what the agent can control.
arXiv Detail & Related papers (2020-02-14T13:57:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.