Measuring Progress in Deep Reinforcement Learning Sample Efficiency
- URL: http://arxiv.org/abs/2102.04881v1
- Date: Tue, 9 Feb 2021 15:27:47 GMT
- Title: Measuring Progress in Deep Reinforcement Learning Sample Efficiency
- Authors: Florian E. Dorner
- Abstract summary: Current benchmarks allow for the cheap and easy generation of large amounts of samples.
As simulating real world processes is often prohibitively hard and collecting real world experience is costly, sample efficiency is an important indicator for economically relevant applications of DRL.
We investigate progress in sample efficiency on Atari games and continuous control tasks by comparing the number of samples that a variety of algorithms need to reach a given performance level.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sampled environment transitions are a critical input to deep reinforcement
learning (DRL) algorithms. Current DRL benchmarks often allow for the cheap and
easy generation of large amounts of samples such that perceived progress in DRL
does not necessarily correspond to improved sample efficiency. As simulating
real world processes is often prohibitively hard and collecting real world
experience is costly, sample efficiency is an important indicator for
economically relevant applications of DRL. We investigate progress in sample
efficiency on Atari games and continuous control tasks by comparing the number
of samples that a variety of algorithms need to reach a given performance level
according to training curves in the corresponding publications. We find
exponential progress in sample efficiency with estimated doubling times of
around 10 to 18 months on Atari, 5 to 24 months on state-based continuous
control and of around 4 to 9 months on pixel-based continuous control depending
on the specific task and performance level.
Related papers
- Learning from Suboptimal Data in Continuous Control via Auto-Regressive Soft Q-Network [23.481553466650453]
We propose Auto-Regressive Soft Q-learning (ARSQ), a value-based RL algorithm that models Q-values in a coarse-to-fine, auto-regressive manner.
ARSQ decomposes the continuous action space into discrete spaces in a coarse-to-fine hierarchy, enhancing sample efficiency for fine-grained continuous control tasks.
It auto-regressively predicts dimensional action advantages within each decision step, enabling more effective decision-making in continuous control tasks.
arXiv Detail & Related papers (2025-02-01T03:04:53Z) - Adaptive Data Exploitation in Deep Reinforcement Learning [50.53705050673944]
We introduce ADEPT, a powerful framework to enhance the **data efficiency** and **generalization** in deep reinforcement learning (RL)
Specifically, ADEPT adaptively manages the use of sampled data across different learning stages via multi-armed bandit (MAB) algorithms.
We test ADEPT on benchmarks including Procgen, MiniGrid, and PyBullet.
arXiv Detail & Related papers (2025-01-22T04:01:17Z) - RecFlow: An Industrial Full Flow Recommendation Dataset [66.06445386541122]
Industrial recommendation systems rely on the multi-stage pipeline to balance effectiveness and efficiency when delivering items to users.
We introduce RecFlow, an industrial full flow recommendation dataset designed to bridge the gap between offline RS benchmarks and the real online environment.
Our dataset comprises 38M interactions from 42K users across nearly 9M items with additional 1.9B stage samples collected from 9.3M online requests over 37 days and spanning 6 stages.
arXiv Detail & Related papers (2024-10-28T09:36:03Z) - MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL [20.22674077197914]
Recent work has explored updating neural networks with large numbers of gradient steps for every new sample.
High update-to-data ratios introduce instability to the training process.
Our method, Model-Augmented Data for Temporal Difference learning (MAD-TD), uses small amounts of generated data to stabilize high UTD training.
arXiv Detail & Related papers (2024-10-11T15:13:17Z) - D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning [99.33607114541861]
We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments.
Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation.
arXiv Detail & Related papers (2024-08-15T22:27:00Z) - Continuous Control with Coarse-to-fine Reinforcement Learning [15.585706638252441]
We present a framework that trains RL agents to zoom-into a continuous action space in a coarse-to-fine manner.
We introduce a concrete, value-based algorithm within the framework called Coarse-to-fine Q-Network (CQN)
CQN robustly learns to solve real-world manipulation tasks within a few minutes of online training.
arXiv Detail & Related papers (2024-07-10T16:04:08Z) - Learning Better with Less: Effective Augmentation for Sample-Efficient
Visual Reinforcement Learning [57.83232242068982]
Data augmentation (DA) is a crucial technique for enhancing the sample efficiency of visual reinforcement learning (RL) algorithms.
It remains unclear which attributes of DA account for its effectiveness in achieving sample-efficient visual RL.
This work conducts comprehensive experiments to assess the impact of DA's attributes on its efficacy.
arXiv Detail & Related papers (2023-05-25T15:46:20Z) - One Backward from Ten Forward, Subsampling for Large-Scale Deep Learning [35.0157090322113]
Large-scale machine learning systems are often continuously trained with enormous data from production environments.
The sheer volume of streaming data poses a significant challenge to real-time training subsystems and ad-hoc sampling is the standard practice.
We propose to record a constant amount of information per instance from these forward passes. The extra information measurably improves the selection of which data instances should participate in forward and backward passes.
arXiv Detail & Related papers (2021-04-27T11:29:02Z) - Continuous Transition: Improving Sample Efficiency for Continuous
Control Problems via MixUp [119.69304125647785]
This paper introduces a concise yet powerful method to construct Continuous Transition.
Specifically, we propose to synthesize new transitions for training by linearly interpolating the consecutive transitions.
To keep the constructed transitions authentic, we also develop a discriminator to guide the construction process automatically.
arXiv Detail & Related papers (2020-11-30T01:20:23Z) - Learning to Sample with Local and Global Contexts in Experience Replay
Buffer [135.94190624087355]
We propose a new learning-based sampling method that can compute the relative importance of transition.
We show that our framework can significantly improve the performance of various off-policy reinforcement learning methods.
arXiv Detail & Related papers (2020-07-14T21:12:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.