GDI: Rethinking What Makes Reinforcement Learning Different From
Supervised Learning
- URL: http://arxiv.org/abs/2106.06232v2
- Date: Tue, 15 Jun 2021 04:44:24 GMT
- Title: GDI: Rethinking What Makes Reinforcement Learning Different From
Supervised Learning
- Authors: Jiajun Fan, Changnan Xiao, Yue Huang
- Abstract summary: We extend the basic paradigm of RL called the Generalized Policy Iteration (GPI) into a more generalized version, which is called the Generalized Data Distribution Iteration (GDI)
Our algorithm has achieved 9620.98% mean human normalized score (HNS), 1146.39% median HNS and 22 human world record breakthroughs (HWRB) using only 200 training frames.
- Score: 8.755783981297396
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Q Network (DQN) firstly kicked the door of deep reinforcement learning
(DRL) via combining deep learning (DL) with reinforcement learning (RL), which
has noticed that the distribution of the acquired data would change during the
training process. DQN found this property might cause instability for training,
so it proposed effective methods to handle the downside of the property.
Instead of focusing on the unfavourable aspects, we find it critical for RL to
ease the gap between the estimated data distribution and the ground truth data
distribution while supervised learning (SL) fails to do so. From this new
perspective, we extend the basic paradigm of RL called the Generalized Policy
Iteration (GPI) into a more generalized version, which is called the
Generalized Data Distribution Iteration (GDI). We see massive RL algorithms and
techniques can be unified into the GDI paradigm, which can be considered as one
of the special cases of GDI. We provide theoretical proof of why GDI is better
than GPI and how it works. Several practical algorithms based on GDI have been
proposed to verify the effectiveness and extensiveness of it. Empirical
experiments prove our state-of-the-art (SOTA) performance on Arcade Learning
Environment (ALE), wherein our algorithm has achieved 9620.98% mean human
normalized score (HNS), 1146.39% median HNS and 22 human world record
breakthroughs (HWRB) using only 200 training frames. Our work aims to lead the
RL research to step into the journey of conquering the human world records and
seek real superhuman agents on both performance and efficiency.
Related papers
- Enhancing Online Reinforcement Learning with Meta-Learned Objective from Offline Data [8.583014846046886]
A major challenge in Reinforcement Learning (RL) is the difficulty of learning an optimal policy from sparse rewards.
We develop Generalized Imitation Learning from Demonstration (GILD), which meta-learns an objective that distills knowledge from offline data.
In four challenging MuJoCo tasks with sparse rewards, we show that three RL algorithms enhanced with GILD significantly outperform state-of-the-art methods.
arXiv Detail & Related papers (2025-01-13T14:11:12Z) - RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning [53.8293458872774]
We propose Reinforcement Learning Distilled Generalists (RLDG) to generate high-quality training data for finetuning generalist policies.
We demonstrate that generalist policies trained with RL-generated data consistently outperform those trained with human demonstrations.
Our results suggest that combining task-specific RL with generalist policy distillation offers a promising approach for developing more capable and efficient robotic manipulation systems.
arXiv Detail & Related papers (2024-12-13T04:57:55Z) - Generative AI for Deep Reinforcement Learning: Framework, Analysis, and Use Cases [60.30995339585003]
Deep reinforcement learning (DRL) has been widely applied across various fields and has achieved remarkable accomplishments.
DRL faces certain limitations, including low sample efficiency and poor generalization.
We present how to leverage generative AI (GAI) to address these issues and enhance the performance of DRL algorithms.
arXiv Detail & Related papers (2024-05-31T01:25:40Z) - Knowledge Graph Reasoning with Self-supervised Reinforcement Learning [30.359557545737747]
We propose a self-supervised pre-training method to warm up the policy network before the RL training stage.
In our supervised learning stage, the agent selects actions based on the policy network and learns from generated labels.
We show that our SSRL model meets or exceeds current state-of-the-art results on all Hits@k and mean reciprocal rank (MRR) metrics.
arXiv Detail & Related papers (2024-05-22T13:39:33Z) - A Survey of Meta-Reinforcement Learning [69.76165430793571]
We cast the development of better RL algorithms as a machine learning problem itself in a process called meta-RL.
We discuss how, at a high level, meta-RL research can be clustered based on the presence of a task distribution and the learning budget available for each individual task.
We conclude by presenting the open problems on the path to making meta-RL part of the standard toolbox for a deep RL practitioner.
arXiv Detail & Related papers (2023-01-19T12:01:41Z) - Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment.
We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent.
We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z) - Generalized Data Distribution Iteration [0.0]
We tackle data richness and exploration-exploitation trade-off simultaneously in deep reinforcement learning.
We introduce operator-based versions of well-known RL methods from DQN to Agent57.
Our algorithm has achieved 9620.33% mean human normalized score (HNS), 1146.39% median HNS and surpassed 22 human world records using only 200M training frames.
arXiv Detail & Related papers (2022-06-07T11:27:40Z) - GRI: General Reinforced Imitation and its Application to Vision-Based
Autonomous Driving [9.030769176986057]
General Reinforced Imitation (GRI) is a novel method which combines benefits from exploration and expert data.
We show that our approach enables major improvements on vision-based autonomous driving in urban environments.
arXiv Detail & Related papers (2021-11-16T15:52:54Z) - RL-DARTS: Differentiable Architecture Search for Reinforcement Learning [62.95469460505922]
We introduce RL-DARTS, one of the first applications of Differentiable Architecture Search (DARTS) in reinforcement learning (RL)
By replacing the image encoder with a DARTS supernet, our search method is sample-efficient, requires minimal extra compute resources, and is also compatible with off-policy and on-policy RL algorithms, needing only minor changes in preexisting code.
We show that the supernet gradually learns better cells, leading to alternative architectures which can be highly competitive against manually designed policies, but also verify previous design choices for RL policies.
arXiv Detail & Related papers (2021-06-04T03:08:43Z) - RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning [108.9599280270704]
We propose a benchmark called RL Unplugged to evaluate and compare offline RL methods.
RL Unplugged includes data from a diverse range of domains including games and simulated motor control problems.
We will release data for all our tasks and open-source all algorithms presented in this paper.
arXiv Detail & Related papers (2020-06-24T17:14:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.