GDI: Rethinking What Makes Reinforcement Learning Different From
Supervised Learning
- URL: http://arxiv.org/abs/2106.06232v2
- Date: Tue, 15 Jun 2021 04:44:24 GMT
- Title: GDI: Rethinking What Makes Reinforcement Learning Different From
Supervised Learning
- Authors: Jiajun Fan, Changnan Xiao, Yue Huang
- Abstract summary: We extend the basic paradigm of RL called the Generalized Policy Iteration (GPI) into a more generalized version, which is called the Generalized Data Distribution Iteration (GDI)
Our algorithm has achieved 9620.98% mean human normalized score (HNS), 1146.39% median HNS and 22 human world record breakthroughs (HWRB) using only 200 training frames.
- Score: 8.755783981297396
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Q Network (DQN) firstly kicked the door of deep reinforcement learning
(DRL) via combining deep learning (DL) with reinforcement learning (RL), which
has noticed that the distribution of the acquired data would change during the
training process. DQN found this property might cause instability for training,
so it proposed effective methods to handle the downside of the property.
Instead of focusing on the unfavourable aspects, we find it critical for RL to
ease the gap between the estimated data distribution and the ground truth data
distribution while supervised learning (SL) fails to do so. From this new
perspective, we extend the basic paradigm of RL called the Generalized Policy
Iteration (GPI) into a more generalized version, which is called the
Generalized Data Distribution Iteration (GDI). We see massive RL algorithms and
techniques can be unified into the GDI paradigm, which can be considered as one
of the special cases of GDI. We provide theoretical proof of why GDI is better
than GPI and how it works. Several practical algorithms based on GDI have been
proposed to verify the effectiveness and extensiveness of it. Empirical
experiments prove our state-of-the-art (SOTA) performance on Arcade Learning
Environment (ALE), wherein our algorithm has achieved 9620.98% mean human
normalized score (HNS), 1146.39% median HNS and 22 human world record
breakthroughs (HWRB) using only 200 training frames. Our work aims to lead the
RL research to step into the journey of conquering the human world records and
seek real superhuman agents on both performance and efficiency.
Related papers
- Is Value Learning Really the Main Bottleneck in Offline RL? [70.54708989409409]
We show that the choice of a policy extraction algorithm significantly affects the performance and scalability of offline RL.
We propose two simple test-time policy improvement methods and show that these methods lead to better performance.
arXiv Detail & Related papers (2024-06-13T17:07:49Z) - Generative AI for Deep Reinforcement Learning: Framework, Analysis, and Use Cases [60.30995339585003]
Deep reinforcement learning (DRL) has been widely applied across various fields and has achieved remarkable accomplishments.
DRL faces certain limitations, including low sample efficiency and poor generalization.
We present how to leverage generative AI (GAI) to address these issues and enhance the performance of DRL algorithms.
arXiv Detail & Related papers (2024-05-31T01:25:40Z) - Knowledge Graph Reasoning with Self-supervised Reinforcement Learning [30.359557545737747]
We propose a self-supervised pre-training method to warm up the policy network before the RL training stage.
In our supervised learning stage, the agent selects actions based on the policy network and learns from generated labels.
We show that our SSRL model meets or exceeds current state-of-the-art results on all Hits@k and mean reciprocal rank (MRR) metrics.
arXiv Detail & Related papers (2024-05-22T13:39:33Z) - A Survey of Meta-Reinforcement Learning [69.76165430793571]
We cast the development of better RL algorithms as a machine learning problem itself in a process called meta-RL.
We discuss how, at a high level, meta-RL research can be clustered based on the presence of a task distribution and the learning budget available for each individual task.
We conclude by presenting the open problems on the path to making meta-RL part of the standard toolbox for a deep RL practitioner.
arXiv Detail & Related papers (2023-01-19T12:01:41Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Generalized Data Distribution Iteration [0.0]
We tackle data richness and exploration-exploitation trade-off simultaneously in deep reinforcement learning.
We introduce operator-based versions of well-known RL methods from DQN to Agent57.
Our algorithm has achieved 9620.33% mean human normalized score (HNS), 1146.39% median HNS and surpassed 22 human world records using only 200M training frames.
arXiv Detail & Related papers (2022-06-07T11:27:40Z) - Federated Deep Reinforcement Learning for the Distributed Control of
NextG Wireless Networks [16.12495409295754]
Next Generation (NextG) networks are expected to support demanding internet tactile applications such as augmented reality and connected autonomous vehicles.
Data-driven approaches can improve the ability of the network to adapt to the current operating conditions.
Deep RL (DRL) has been shown to achieve good performance even in complex environments.
arXiv Detail & Related papers (2021-12-07T03:13:20Z) - GRI: General Reinforced Imitation and its Application to Vision-Based
Autonomous Driving [9.030769176986057]
General Reinforced Imitation (GRI) is a novel method which combines benefits from exploration and expert data.
We show that our approach enables major improvements on vision-based autonomous driving in urban environments.
arXiv Detail & Related papers (2021-11-16T15:52:54Z) - RL-DARTS: Differentiable Architecture Search for Reinforcement Learning [62.95469460505922]
We introduce RL-DARTS, one of the first applications of Differentiable Architecture Search (DARTS) in reinforcement learning (RL)
By replacing the image encoder with a DARTS supernet, our search method is sample-efficient, requires minimal extra compute resources, and is also compatible with off-policy and on-policy RL algorithms, needing only minor changes in preexisting code.
We show that the supernet gradually learns better cells, leading to alternative architectures which can be highly competitive against manually designed policies, but also verify previous design choices for RL policies.
arXiv Detail & Related papers (2021-06-04T03:08:43Z) - RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning [108.9599280270704]
We propose a benchmark called RL Unplugged to evaluate and compare offline RL methods.
RL Unplugged includes data from a diverse range of domains including games and simulated motor control problems.
We will release data for all our tasks and open-source all algorithms presented in this paper.
arXiv Detail & Related papers (2020-06-24T17:14:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.