Related papers: Measuring Progress in Deep Reinforcement Learning Sample Efficiency

Measuring Progress in Deep Reinforcement Learning Sample Efficiency

URL: http://arxiv.org/abs/2102.04881v1
Date: Tue, 9 Feb 2021 15:27:47 GMT
Title: Measuring Progress in Deep Reinforcement Learning Sample Efficiency
Authors: Florian E. Dorner
Abstract summary: Current benchmarks allow for the cheap and easy generation of large amounts of samples. As simulating real world processes is often prohibitively hard and collecting real world experience is costly, sample efficiency is an important indicator for economically relevant applications of DRL. We investigate progress in sample efficiency on Atari games and continuous control tasks by comparing the number of samples that a variety of algorithms need to reach a given performance level.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Sampled environment transitions are a critical input to deep reinforcement learning (DRL) algorithms. Current DRL benchmarks often allow for the cheap and easy generation of large amounts of samples such that perceived progress in DRL does not necessarily correspond to improved sample efficiency. As simulating real world processes is often prohibitively hard and collecting real world experience is costly, sample efficiency is an important indicator for economically relevant applications of DRL. We investigate progress in sample efficiency on Atari games and continuous control tasks by comparing the number of samples that a variety of algorithms need to reach a given performance level according to training curves in the corresponding publications. We find exponential progress in sample efficiency with estimated doubling times of around 10 to 18 months on Atari, 5 to 24 months on state-based continuous control and of around 4 to 9 months on pixel-based continuous control depending on the specific task and performance level.

Related papers

Learning from Suboptimal Data in Continuous Control via Auto-Regressive Soft Q-Network [23.481553466650453]
We propose Auto-Regressive Soft Q-learning (ARSQ), a value-based RL algorithm that models Q-values in a coarse-to-fine, auto-regressive manner. ARSQ decomposes the continuous action space into discrete spaces in a coarse-to-fine hierarchy, enhancing sample efficiency for fine-grained continuous control tasks. It auto-regressively predicts dimensional action advantages within each decision step, enabling more effective decision-making in continuous control tasks.
arXiv Detail & Related papers (2025-02-01T03:04:53Z)
Adaptive Data Exploitation in Deep Reinforcement Learning [50.53705050673944]
We introduce ADEPT, a powerful framework to enhance the **data efficiency** and **generalization** in deep reinforcement learning (RL) Specifically, ADEPT adaptively manages the use of sampled data across different learning stages via multi-armed bandit (MAB) algorithms. We test ADEPT on benchmarks including Procgen, MiniGrid, and PyBullet.
arXiv Detail & Related papers (2025-01-22T04:01:17Z)
RecFlow: An Industrial Full Flow Recommendation Dataset [66.06445386541122]
Industrial recommendation systems rely on the multi-stage pipeline to balance effectiveness and efficiency when delivering items to users. We introduce RecFlow, an industrial full flow recommendation dataset designed to bridge the gap between offline RS benchmarks and the real online environment. Our dataset comprises 38M interactions from 42K users across nearly 9M items with additional 1.9B stage samples collected from 9.3M online requests over 37 days and spanning 6 stages.
arXiv Detail & Related papers (2024-10-28T09:36:03Z)
MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL [20.22674077197914]
Recent work has explored updating neural networks with large numbers of gradient steps for every new sample. High update-to-data ratios introduce instability to the training process. Our method, Model-Augmented Data for Temporal Difference learning (MAD-TD), uses small amounts of generated data to stabilize high UTD training.
arXiv Detail & Related papers (2024-10-11T15:13:17Z)
SHIRE: Enhancing Sample Efficiency using Human Intuition in REinforcement Learning [11.304750795377657]
We propose SHIRE, a framework for encoding human intuition using Probabilistic Graphical Models (PGMs) SHIRE achieves 25-78% sample efficiency gains across the environments we evaluate at negligible overhead cost.
arXiv Detail & Related papers (2024-09-16T04:46:22Z)
D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning [99.33607114541861]
We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments. Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation.
arXiv Detail & Related papers (2024-08-15T22:27:00Z)
Continuous Control with Coarse-to-fine Reinforcement Learning [15.585706638252441]
We present a framework that trains RL agents to zoom-into a continuous action space in a coarse-to-fine manner. We introduce a concrete, value-based algorithm within the framework called Coarse-to-fine Q-Network (CQN) CQN robustly learns to solve real-world manipulation tasks within a few minutes of online training.
arXiv Detail & Related papers (2024-07-10T16:04:08Z)
ITERTL: An Iterative Framework for Fine-tuning LLMs for RTL Code Generation [9.409062607311528]
Large language models (LLMs) have demonstrated excellent performance, inspiring researchers to explore their use in automating register transfer level (RTL) code generation. Existing approaches to fine-tune LLMs for RTL generation typically are conducted on fixed datasets. We introduce an iterative training paradigm named ITERTL to mitigate these issues. Our model outperforms GPT4 and state-of-the-art (SOTA) open-source models, achieving remarkable 53.8% pass@1 rate on VerilogEval-human benchmark.
arXiv Detail & Related papers (2024-06-28T01:44:57Z)
Learning Better with Less: Effective Augmentation for Sample-Efficient Visual Reinforcement Learning [57.83232242068982]
Data augmentation (DA) is a crucial technique for enhancing the sample efficiency of visual reinforcement learning (RL) algorithms. It remains unclear which attributes of DA account for its effectiveness in achieving sample-efficient visual RL. This work conducts comprehensive experiments to assess the impact of DA's attributes on its efficacy.
arXiv Detail & Related papers (2023-05-25T15:46:20Z)
Data Sampling Affects the Complexity of Online SGD over Dependent Data [54.92366535993012]
We show how different data sampling schemes affect the sample complexity of online gradient descent over highly dependent data. Even subsampling a subset of data samples can accelerate the convergence of online SGD over highly dependent data.
arXiv Detail & Related papers (2022-03-31T07:48:30Z)
One Backward from Ten Forward, Subsampling for Large-Scale Deep Learning [35.0157090322113]
Large-scale machine learning systems are often continuously trained with enormous data from production environments. The sheer volume of streaming data poses a significant challenge to real-time training subsystems and ad-hoc sampling is the standard practice. We propose to record a constant amount of information per instance from these forward passes. The extra information measurably improves the selection of which data instances should participate in forward and backward passes.
arXiv Detail & Related papers (2021-04-27T11:29:02Z)
Continuous Transition: Improving Sample Efficiency for Continuous Control Problems via MixUp [119.69304125647785]
This paper introduces a concise yet powerful method to construct Continuous Transition. Specifically, we propose to synthesize new transitions for training by linearly interpolating the consecutive transitions. To keep the constructed transitions authentic, we also develop a discriminator to guide the construction process automatically.
arXiv Detail & Related papers (2020-11-30T01:20:23Z)
Learning to Sample with Local and Global Contexts in Experience Replay Buffer [135.94190624087355]
We propose a new learning-based sampling method that can compute the relative importance of transition. We show that our framework can significantly improve the performance of various off-policy reinforcement learning methods.
arXiv Detail & Related papers (2020-07-14T21:12:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.