Mastering Atari Games with Limited Data
- URL: http://arxiv.org/abs/2111.00210v1
- Date: Sat, 30 Oct 2021 09:13:39 GMT
- Title: Mastering Atari Games with Limited Data
- Authors: Weirui Ye, Shaohuai Liu, Thanard Kurutach, Pieter Abbeel, Yang Gao
- Abstract summary: We propose a sample efficient model-based visual RL algorithm built on MuZero, which we name EfficientZero.
Our method achieves 190.4% mean human performance on the Atari 100k benchmark with only two hours of real-time game experience.
This is the first time an algorithm achieves super-human performance on Atari games with such little data.
- Score: 73.6189496825209
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning has achieved great success in many applications.
However, sample efficiency remains a key challenge, with prominent methods
requiring millions (or even billions) of environment steps to train. Recently,
there has been significant progress in sample efficient image-based RL
algorithms; however, consistent human-level performance on the Atari game
benchmark remains an elusive goal. We propose a sample efficient model-based
visual RL algorithm built on MuZero, which we name EfficientZero. Our method
achieves 190.4% mean human performance and 116.0% median performance on the
Atari 100k benchmark with only two hours of real-time game experience and
outperforms the state SAC in some tasks on the DMControl 100k benchmark. This
is the first time an algorithm achieves super-human performance on Atari games
with such little data. EfficientZero's performance is also close to DQN's
performance at 200 million frames while we consume 500 times less data.
EfficientZero's low sample complexity and high performance can bring RL closer
to real-world applicability. We implement our algorithm in an
easy-to-understand manner and it is available at
https://github.com/YeWR/EfficientZero. We hope it will accelerate the research
of MCTS-based RL algorithms in the wider community.
Related papers
- ReZero: Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze [5.671696366787522]
We propose a general approach named ReZero to boost tree search operations for Monte Carlo Tree Search algorithms.
Specifically, we reanalyze training samples through a backward-view reuse technique which obtains the value estimation of a certain child node in advance.
Experiments conducted on Atari environments and board games demonstrate that ReZero substantially improves training speed while maintaining high sample efficiency.
arXiv Detail & Related papers (2024-04-25T07:02:07Z) - EfficientZero V2: Mastering Discrete and Continuous Control with Limited Data [22.621203162457018]
We introduce EfficientZero V2, a framework designed for sample-efficient Reinforcement Learning (RL) algorithms.
With a series of improvements, EfficientZero V2 outperforms the current state-of-the-art (SOTA) by a significant margin in diverse tasks.
EfficientZero V2 exhibits a notable advancement over the prevailing general algorithm, DreamerV3, achieving superior outcomes in 50 of 66 evaluated tasks.
arXiv Detail & Related papers (2024-03-01T14:42:25Z) - MiniZero: Comparative Analysis of AlphaZero and MuZero on Go, Othello, and Atari Games [9.339645051415115]
MiniZero is a zero-knowledge learning framework that supports four state-of-the-art algorithms.
We evaluate the performance of each algorithm in two board games, 9x9 Go and 8x8 Othello, as well as 57 Atari games.
arXiv Detail & Related papers (2023-10-17T14:29:25Z) - Planning for Sample Efficient Imitation Learning [52.44953015011569]
Current imitation algorithms struggle to achieve high performance and high in-environment sample efficiency simultaneously.
We propose EfficientImitate, a planning-based imitation learning method that can achieve high in-environment sample efficiency and performance simultaneously.
Experimental results show that EI achieves state-of-the-art results in performance and sample efficiency.
arXiv Detail & Related papers (2022-10-18T05:19:26Z) - Efficient Offline Policy Optimization with a Learned Model [83.64779942889916]
MuZero Unplugged presents a promising approach for offline policy learning from logged data.
It conducts Monte-Carlo Tree Search (MCTS) with a learned model and leverages Reanalyze algorithm to learn purely from offline data.
This paper investigates a few hypotheses where MuZero Unplugged may not work well under the offline settings.
arXiv Detail & Related papers (2022-10-12T07:41:04Z) - Injecting Domain Adaptation with Learning-to-hash for Effective and
Efficient Zero-shot Dense Retrieval [49.98615945702959]
We evaluate LTH and vector compression techniques for improving the downstream zero-shot retrieval accuracy of the TAS-B dense retriever.
Our results demonstrate that, unlike prior work, LTH strategies when applied naively can underperform the zero-shot TAS-B dense retriever on average by up to 14% nDCG@10.
arXiv Detail & Related papers (2022-05-23T17:53:44Z) - Fast and Data Efficient Reinforcement Learning from Pixels via
Non-Parametric Value Approximation [90.78178803486746]
We present Nonparametric Approximation of Inter-Trace returns (NAIT), a Reinforcement Learning algorithm for discrete action, pixel-based environments.
We empirically evaluate NAIT on both the 26 and 57 game variants of ATARI100k where, despite its simplicity, it achieves competitive performance in the online setting with greater than 100x speedup in wall-time.
arXiv Detail & Related papers (2022-03-07T00:31:31Z) - Taming GANs with Lookahead-Minmax [63.90038365274479]
Experimental results on MNIST, SVHN, CIFAR-10, and ImageNet demonstrate a clear advantage of combining Lookahead-minmax with Adam or extragradient.
Using 30-fold fewer parameters and 16-fold smaller minibatches we outperform the reported performance of the class-dependent BigGAN on CIFAR-10 by obtaining FID of 12.19 without using the class labels.
arXiv Detail & Related papers (2020-06-25T17:13:23Z) - Agent57: Outperforming the Atari Human Benchmark [15.75730239983062]
Atari games have been a long-standing benchmark in reinforcement learning.
We propose Agent57, the first deep RL agent that outperforms the standard human benchmark on all 57 Atari games.
arXiv Detail & Related papers (2020-03-30T11:33:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.