A Review for Deep Reinforcement Learning in Atari:Benchmarks,
Challenges, and Solutions
- URL: http://arxiv.org/abs/2112.04145v2
- Date: Fri, 10 Dec 2021 14:48:34 GMT
- Title: A Review for Deep Reinforcement Learning in Atari:Benchmarks,
Challenges, and Solutions
- Authors: Jiajun Fan
- Abstract summary: Arcade Learning Environment (ALE) is proposed as an evaluation platform for empirically assessing the generality of agents across Atari 2600 games.
From Deep Q-Networks (DQN) to Agent57, RL agents seem to achieve superhuman performance in ALE.
We propose a novel Atari benchmark based on human world records (HWR), which puts forward higher requirements for RL agents on both final performance and learning efficiency.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Arcade Learning Environment (ALE) is proposed as an evaluation platform
for empirically assessing the generality of agents across dozens of Atari 2600
games. ALE offers various challenging problems and has drawn significant
attention from the deep reinforcement learning (RL) community. From Deep
Q-Networks (DQN) to Agent57, RL agents seem to achieve superhuman performance
in ALE. However, is this the case? In this paper, to explore this problem, we
first review the current evaluation metrics in the Atari benchmarks and then
reveal that the current evaluation criteria of achieving superhuman performance
are inappropriate, which underestimated the human performance relative to what
is possible. To handle those problems and promote the development of RL
research, we propose a novel Atari benchmark based on human world records
(HWR), which puts forward higher requirements for RL agents on both final
performance and learning efficiency. Furthermore, we summarize the
state-of-the-art (SOTA) methods in Atari benchmarks and provide benchmark
results over new evaluation metrics based on human world records. We concluded
that at least four open challenges hinder RL agents from achieving superhuman
performance from those new benchmark results. Finally, we also discuss some
promising ways to handle those problems.
Related papers
- Leveraging Reward Consistency for Interpretable Feature Discovery in
Reinforcement Learning [69.19840497497503]
It is argued that the commonly used action matching principle is more like an explanation of deep neural networks (DNNs) than the interpretation of RL agents.
We propose to consider rewards, the essential objective of RL agents, as the essential objective of interpreting RL agents.
We verify and evaluate our method on the Atari 2600 games as well as Duckietown, a challenging self-driving car simulator environment.
arXiv Detail & Related papers (2023-09-04T09:09:54Z) - ARB: Advanced Reasoning Benchmark for Large Language Models [94.37521840642141]
We introduce ARB, a novel benchmark composed of advanced reasoning problems in multiple fields.
As a subset of ARB, we introduce a challenging set of math and physics problems which require advanced symbolic reasoning and domain knowledge.
We evaluate recent models such as GPT-4 and Claude on ARB and demonstrate that current models score well below 50% on more demanding tasks.
arXiv Detail & Related papers (2023-07-25T17:55:19Z) - Int-HRL: Towards Intention-based Hierarchical Reinforcement Learning [23.062590084580542]
Int-HRL: Hierarchical RL with intention-based sub-goals that are inferred from human eye gaze.
Our evaluations show that replacing hand-crafted sub-goals with automatically extracted intentions leads to a HRL agent that is significantly more sample efficient than previous methods.
arXiv Detail & Related papers (2023-06-20T12:12:16Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Mask Atari for Deep Reinforcement Learning as POMDP Benchmarks [3.549772411359722]
Mask Atari is a new benchmark to help solve partially observable Markov decision process (POMDP) problems.
It is constructed based on Atari 2600 games with controllable, moveable, and learnable masks as the observation area.
We describe the challenges and features of our benchmark and evaluate several baselines with Mask Atari.
arXiv Detail & Related papers (2022-03-31T03:34:02Z) - Mastering Atari Games with Limited Data [73.6189496825209]
We propose a sample efficient model-based visual RL algorithm built on MuZero, which we name EfficientZero.
Our method achieves 190.4% mean human performance on the Atari 100k benchmark with only two hours of real-time game experience.
This is the first time an algorithm achieves super-human performance on Atari games with such little data.
arXiv Detail & Related papers (2021-10-30T09:13:39Z) - Robust Deep Reinforcement Learning through Adversarial Loss [74.20501663956604]
Recent studies have shown that deep reinforcement learning agents are vulnerable to small adversarial perturbations on the agent's inputs.
We propose RADIAL-RL, a principled framework to train reinforcement learning agents with improved robustness against adversarial attacks.
arXiv Detail & Related papers (2020-08-05T07:49:42Z) - RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning [108.9599280270704]
We propose a benchmark called RL Unplugged to evaluate and compare offline RL methods.
RL Unplugged includes data from a diverse range of domains including games and simulated motor control problems.
We will release data for all our tasks and open-source all algorithms presented in this paper.
arXiv Detail & Related papers (2020-06-24T17:14:51Z) - Agent57: Outperforming the Atari Human Benchmark [15.75730239983062]
Atari games have been a long-standing benchmark in reinforcement learning.
We propose Agent57, the first deep RL agent that outperforms the standard human benchmark on all 57 Atari games.
arXiv Detail & Related papers (2020-03-30T11:33:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.