Related papers: GDI: Rethinking What Makes Reinforcement Learning Different From Supervised Learning

GDI: Rethinking What Makes Reinforcement Learning Different From Supervised Learning

URL: http://arxiv.org/abs/2106.06232v2
Date: Tue, 15 Jun 2021 04:44:24 GMT
Title: GDI: Rethinking What Makes Reinforcement Learning Different From Supervised Learning
Authors: Jiajun Fan, Changnan Xiao, Yue Huang
Abstract summary: We extend the basic paradigm of RL called the Generalized Policy Iteration (GPI) into a more generalized version, which is called the Generalized Data Distribution Iteration (GDI) Our algorithm has achieved 9620.98% mean human normalized score (HNS), 1146.39% median HNS and 22 human world record breakthroughs (HWRB) using only 200 training frames.
Score: 8.755783981297396
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep Q Network (DQN) firstly kicked the door of deep reinforcement learning (DRL) via combining deep learning (DL) with reinforcement learning (RL), which has noticed that the distribution of the acquired data would change during the training process. DQN found this property might cause instability for training, so it proposed effective methods to handle the downside of the property. Instead of focusing on the unfavourable aspects, we find it critical for RL to ease the gap between the estimated data distribution and the ground truth data distribution while supervised learning (SL) fails to do so. From this new perspective, we extend the basic paradigm of RL called the Generalized Policy Iteration (GPI) into a more generalized version, which is called the Generalized Data Distribution Iteration (GDI). We see massive RL algorithms and techniques can be unified into the GDI paradigm, which can be considered as one of the special cases of GDI. We provide theoretical proof of why GDI is better than GPI and how it works. Several practical algorithms based on GDI have been proposed to verify the effectiveness and extensiveness of it. Empirical experiments prove our state-of-the-art (SOTA) performance on Arcade Learning Environment (ALE), wherein our algorithm has achieved 9620.98% mean human normalized score (HNS), 1146.39% median HNS and 22 human world record breakthroughs (HWRB) using only 200 training frames. Our work aims to lead the RL research to step into the journey of conquering the human world records and seek real superhuman agents on both performance and efficiency.

Related papers

Enhancing Online Reinforcement Learning with Meta-Learned Objective from Offline Data [8.583014846046886]
A major challenge in Reinforcement Learning (RL) is the difficulty of learning an optimal policy from sparse rewards. We develop Generalized Imitation Learning from Demonstration (GILD), which meta-learns an objective that distills knowledge from offline data. In four challenging MuJoCo tasks with sparse rewards, we show that three RL algorithms enhanced with GILD significantly outperform state-of-the-art methods.
arXiv Detail & Related papers (2025-01-13T14:11:12Z)
RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning [53.8293458872774]
We propose Reinforcement Learning Distilled Generalists (RLDG) to generate high-quality training data for finetuning generalist policies. We demonstrate that generalist policies trained with RL-generated data consistently outperform those trained with human demonstrations. Our results suggest that combining task-specific RL with generalist policy distillation offers a promising approach for developing more capable and efficient robotic manipulation systems.
arXiv Detail & Related papers (2024-12-13T04:57:55Z)
Is Value Learning Really the Main Bottleneck in Offline RL? [70.54708989409409]
We show that the choice of a policy extraction algorithm significantly affects the performance and scalability of offline RL. We propose two simple test-time policy improvement methods and show that these methods lead to better performance.
arXiv Detail & Related papers (2024-06-13T17:07:49Z)
Generative AI for Deep Reinforcement Learning: Framework, Analysis, and Use Cases [60.30995339585003]
Deep reinforcement learning (DRL) has been widely applied across various fields and has achieved remarkable accomplishments. DRL faces certain limitations, including low sample efficiency and poor generalization. We present how to leverage generative AI (GAI) to address these issues and enhance the performance of DRL algorithms.
arXiv Detail & Related papers (2024-05-31T01:25:40Z)
Knowledge Graph Reasoning with Self-supervised Reinforcement Learning [30.359557545737747]
We propose a self-supervised pre-training method to warm up the policy network before the RL training stage. In our supervised learning stage, the agent selects actions based on the policy network and learns from generated labels. We show that our SSRL model meets or exceeds current state-of-the-art results on all Hits@k and mean reciprocal rank (MRR) metrics.
arXiv Detail & Related papers (2024-05-22T13:39:33Z)
A Survey of Meta-Reinforcement Learning [69.76165430793571]
We cast the development of better RL algorithms as a machine learning problem itself in a process called meta-RL. We discuss how, at a high level, meta-RL research can be clustered based on the presence of a task distribution and the learning budget available for each individual task. We conclude by presenting the open problems on the path to making meta-RL part of the standard toolbox for a deep RL practitioner.
arXiv Detail & Related papers (2023-01-19T12:01:41Z)
Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels [112.63440666617494]
Reinforcement learning algorithms can succeed but require large amounts of interactions between the agent and the environment. We propose a new method to solve it, using unsupervised model-based RL, for pre-training the agent. We show robust performance on the Real-Word RL benchmark, hinting at resiliency to environment perturbations during adaptation.
arXiv Detail & Related papers (2022-09-24T14:22:29Z)
Generalized Data Distribution Iteration [0.0]
We tackle data richness and exploration-exploitation trade-off simultaneously in deep reinforcement learning. We introduce operator-based versions of well-known RL methods from DQN to Agent57. Our algorithm has achieved 9620.33% mean human normalized score (HNS), 1146.39% median HNS and surpassed 22 human world records using only 200M training frames.
arXiv Detail & Related papers (2022-06-07T11:27:40Z)
Federated Deep Reinforcement Learning for the Distributed Control of NextG Wireless Networks [16.12495409295754]
Next Generation (NextG) networks are expected to support demanding internet tactile applications such as augmented reality and connected autonomous vehicles. Data-driven approaches can improve the ability of the network to adapt to the current operating conditions. Deep RL (DRL) has been shown to achieve good performance even in complex environments.
arXiv Detail & Related papers (2021-12-07T03:13:20Z)
GRI: General Reinforced Imitation and its Application to Vision-Based Autonomous Driving [9.030769176986057]
General Reinforced Imitation (GRI) is a novel method which combines benefits from exploration and expert data. We show that our approach enables major improvements on vision-based autonomous driving in urban environments.
arXiv Detail & Related papers (2021-11-16T15:52:54Z)
RL-DARTS: Differentiable Architecture Search for Reinforcement Learning [62.95469460505922]
We introduce RL-DARTS, one of the first applications of Differentiable Architecture Search (DARTS) in reinforcement learning (RL) By replacing the image encoder with a DARTS supernet, our search method is sample-efficient, requires minimal extra compute resources, and is also compatible with off-policy and on-policy RL algorithms, needing only minor changes in preexisting code. We show that the supernet gradually learns better cells, leading to alternative architectures which can be highly competitive against manually designed policies, but also verify previous design choices for RL policies.
arXiv Detail & Related papers (2021-06-04T03:08:43Z)
RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning [108.9599280270704]
We propose a benchmark called RL Unplugged to evaluate and compare offline RL methods. RL Unplugged includes data from a diverse range of domains including games and simulated motor control problems. We will release data for all our tasks and open-source all algorithms presented in this paper.
arXiv Detail & Related papers (2020-06-24T17:14:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.