Related papers: Mastering Atari Games with Limited Data

Mastering Atari Games with Limited Data

URL: http://arxiv.org/abs/2111.00210v1
Date: Sat, 30 Oct 2021 09:13:39 GMT
Title: Mastering Atari Games with Limited Data
Authors: Weirui Ye, Shaohuai Liu, Thanard Kurutach, Pieter Abbeel, Yang Gao
Abstract summary: We propose a sample efficient model-based visual RL algorithm built on MuZero, which we name EfficientZero. Our method achieves 190.4% mean human performance on the Atari 100k benchmark with only two hours of real-time game experience. This is the first time an algorithm achieves super-human performance on Atari games with such little data.
Score: 73.6189496825209
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning has achieved great success in many applications. However, sample efficiency remains a key challenge, with prominent methods requiring millions (or even billions) of environment steps to train. Recently, there has been significant progress in sample efficient image-based RL algorithms; however, consistent human-level performance on the Atari game benchmark remains an elusive goal. We propose a sample efficient model-based visual RL algorithm built on MuZero, which we name EfficientZero. Our method achieves 190.4% mean human performance and 116.0% median performance on the Atari 100k benchmark with only two hours of real-time game experience and outperforms the state SAC in some tasks on the DMControl 100k benchmark. This is the first time an algorithm achieves super-human performance on Atari games with such little data. EfficientZero's performance is also close to DQN's performance at 200 million frames while we consume 500 times less data. EfficientZero's low sample complexity and high performance can bring RL closer to real-world applicability. We implement our algorithm in an easy-to-understand manner and it is available at https://github.com/YeWR/EfficientZero. We hope it will accelerate the research of MCTS-based RL algorithms in the wider community.

Related papers

Adaptive Data Exploitation in Deep Reinforcement Learning [50.53705050673944]
We introduce ADEPT, a powerful framework to enhance the **data efficiency** and **generalization** in deep reinforcement learning (RL) Specifically, ADEPT adaptively manages the use of sampled data across different learning stages via multi-armed bandit (MAB) algorithms. We test ADEPT on benchmarks including Procgen, MiniGrid, and PyBullet.
arXiv Detail & Related papers (2025-01-22T04:01:17Z)
ReZero: Boosting MCTS-based Algorithms by Backward-view and Entire-buffer Reanalyze [5.671696366787522]
We propose a general approach named ReZero to boost tree search operations for Monte Carlo Tree Search algorithms. Specifically, we reanalyze training samples through a backward-view reuse technique which obtains the value estimation of a certain child node in advance. Experiments conducted on Atari environments and board games demonstrate that ReZero substantially improves training speed while maintaining high sample efficiency.
arXiv Detail & Related papers (2024-04-25T07:02:07Z)
EfficientZero V2: Mastering Discrete and Continuous Control with Limited Data [22.621203162457018]
We introduce EfficientZero V2, a framework designed for sample-efficient Reinforcement Learning (RL) algorithms. With a series of improvements, EfficientZero V2 outperforms the current state-of-the-art (SOTA) by a significant margin in diverse tasks. EfficientZero V2 exhibits a notable advancement over the prevailing general algorithm, DreamerV3, achieving superior outcomes in 50 of 66 evaluated tasks.
arXiv Detail & Related papers (2024-03-01T14:42:25Z)
MiniZero: Comparative Analysis of AlphaZero and MuZero on Go, Othello, and Atari Games [9.339645051415115]
MiniZero is a zero-knowledge learning framework that supports four state-of-the-art algorithms. We evaluate the performance of each algorithm in two board games, 9x9 Go and 8x8 Othello, as well as 57 Atari games.
arXiv Detail & Related papers (2023-10-17T14:29:25Z)
Planning for Sample Efficient Imitation Learning [52.44953015011569]
Current imitation algorithms struggle to achieve high performance and high in-environment sample efficiency simultaneously. We propose EfficientImitate, a planning-based imitation learning method that can achieve high in-environment sample efficiency and performance simultaneously. Experimental results show that EI achieves state-of-the-art results in performance and sample efficiency.
arXiv Detail & Related papers (2022-10-18T05:19:26Z)
Efficient Offline Policy Optimization with a Learned Model [83.64779942889916]
MuZero Unplugged presents a promising approach for offline policy learning from logged data. It conducts Monte-Carlo Tree Search (MCTS) with a learned model and leverages Reanalyze algorithm to learn purely from offline data. This paper investigates a few hypotheses where MuZero Unplugged may not work well under the offline settings.
arXiv Detail & Related papers (2022-10-12T07:41:04Z)
Injecting Domain Adaptation with Learning-to-hash for Effective and Efficient Zero-shot Dense Retrieval [49.98615945702959]
We evaluate LTH and vector compression techniques for improving the downstream zero-shot retrieval accuracy of the TAS-B dense retriever. Our results demonstrate that, unlike prior work, LTH strategies when applied naively can underperform the zero-shot TAS-B dense retriever on average by up to 14% nDCG@10.
arXiv Detail & Related papers (2022-05-23T17:53:44Z)
Fast and Data Efficient Reinforcement Learning from Pixels via Non-Parametric Value Approximation [90.78178803486746]
We present Nonparametric Approximation of Inter-Trace returns (NAIT), a Reinforcement Learning algorithm for discrete action, pixel-based environments. We empirically evaluate NAIT on both the 26 and 57 game variants of ATARI100k where, despite its simplicity, it achieves competitive performance in the online setting with greater than 100x speedup in wall-time.
arXiv Detail & Related papers (2022-03-07T00:31:31Z)
A Review for Deep Reinforcement Learning in Atari:Benchmarks, Challenges, and Solutions [0.0]
Arcade Learning Environment (ALE) is proposed as an evaluation platform for empirically assessing the generality of agents across Atari 2600 games. From Deep Q-Networks (DQN) to Agent57, RL agents seem to achieve superhuman performance in ALE. We propose a novel Atari benchmark based on human world records (HWR), which puts forward higher requirements for RL agents on both final performance and learning efficiency.
arXiv Detail & Related papers (2021-12-08T06:52:23Z)
Taming GANs with Lookahead-Minmax [63.90038365274479]
Experimental results on MNIST, SVHN, CIFAR-10, and ImageNet demonstrate a clear advantage of combining Lookahead-minmax with Adam or extragradient. Using 30-fold fewer parameters and 16-fold smaller minibatches we outperform the reported performance of the class-dependent BigGAN on CIFAR-10 by obtaining FID of 12.19 without using the class labels.
arXiv Detail & Related papers (2020-06-25T17:13:23Z)
Agent57: Outperforming the Atari Human Benchmark [15.75730239983062]
Atari games have been a long-standing benchmark in reinforcement learning. We propose Agent57, the first deep RL agent that outperforms the standard human benchmark on all 57 Atari games.
arXiv Detail & Related papers (2020-03-30T11:33:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.