Related papers: Importance of using appropriate baselines for evaluation of data-efficiency in deep reinforcement learning for Atari

Importance of using appropriate baselines for evaluation of data-efficiency in deep reinforcement learning for Atari

URL: http://arxiv.org/abs/2003.10181v2
Date: Tue, 31 Mar 2020 17:00:42 GMT
Title: Importance of using appropriate baselines for evaluation of data-efficiency in deep reinforcement learning for Atari
Authors: Kacper Kielak
Abstract summary: We show that the actual improvement in the efficiency came from allowing the algorithm for more training updates for each data sample. We argue that the agent similar to the modified DQN that is presented in this paper should be used as a baseline for any future work aimed at improving sample efficiency of deep reinforcement learning.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Reinforcement learning (RL) has seen great advancements in the past few years. Nevertheless, the consensus among the RL community is that currently used methods, despite all their benefits, suffer from extreme data inefficiency, especially in the rich visual domains like Atari. To circumvent this problem, novel approaches were introduced that often claim to be much more efficient than popular variations of the state-of-the-art DQN algorithm. In this paper, however, we demonstrate that the newly proposed techniques simply used unfair baselines in their experiments. Namely, we show that the actual improvement in the efficiency came from allowing the algorithm for more training updates for each data sample, and not from employing the new methods. By allowing DQN to execute network updates more frequently we manage to reach similar or better results than the recently proposed advancement, often at a fraction of complexity and computational costs. Furthermore, based on the outcomes of the study, we argue that the agent similar to the modified DQN that is presented in this paper should be used as a baseline for any future work aimed at improving sample efficiency of deep reinforcement learning.

Related papers

Search-Based Adversarial Estimates for Improving Sample Efficiency in Off-Policy Reinforcement Learning [0.0]
We propose to use Adversarial Estimates as a new, simple and efficient approach to mitigate this problem. Our approach leverages latent similarity search from a small set of human-collected trajectories to boost learning. The results of our study show algorithms trained with Adversarial Estimates converge faster than their original version.
arXiv Detail & Related papers (2025-02-03T17:41:02Z)
Adaptive Data Exploitation in Deep Reinforcement Learning [50.53705050673944]
We introduce ADEPT, a powerful framework to enhance the **data efficiency** and **generalization** in deep reinforcement learning (RL) Specifically, ADEPT adaptively manages the use of sampled data across different learning stages via multi-armed bandit (MAB) algorithms. We test ADEPT on benchmarks including Procgen, MiniGrid, and PyBullet.
arXiv Detail & Related papers (2025-01-22T04:01:17Z)
Towards Sample-Efficiency and Generalization of Transfer and Inverse Reinforcement Learning: A Comprehensive Literature Review [50.67937325077047]
This paper is devoted to a comprehensive review of realizing the sample efficiency and generalization of RL algorithms through transfer and inverse reinforcement learning (T-IRL) Our findings denote that a majority of recent research works have dealt with the aforementioned challenges by utilizing human-in-the-loop and sim-to-real strategies. Under the IRL structure, training schemes that require a low number of experience transitions and extension of such frameworks to multi-agent and multi-intention problems have been the priority of researchers in recent years.
arXiv Detail & Related papers (2024-11-15T15:18:57Z)
Simple Ingredients for Offline Reinforcement Learning [86.1988266277766]
offline reinforcement learning algorithms have proven effective on datasets highly connected to the target downstream task. We show that existing methods struggle with diverse data: their performance considerably deteriorates as data collected for related but different tasks is simply added to the offline buffer. We show that scale, more than algorithmic considerations, is the key factor influencing performance.
arXiv Detail & Related papers (2024-03-19T18:57:53Z)
Revisiting Data Augmentation in Deep Reinforcement Learning [3.660182910533372]
Various data augmentation techniques have been recently proposed in image-based deep reinforcement learning (DRL) We analyze existing methods to better understand them and to uncover how they are connected. This analysis suggests recommendations on how to exploit data augmentation in a more principled way.
arXiv Detail & Related papers (2024-02-19T14:42:10Z)
Learning Diverse Policies with Soft Self-Generated Guidance [2.9602904918952695]
Reinforcement learning with sparse and deceptive rewards is challenging because non-zero rewards are rarely obtained. This paper develops an approach that uses diverse past trajectories for faster and more efficient online RL.
arXiv Detail & Related papers (2024-02-07T02:53:50Z)
Back to Basics: A Simple Recipe for Improving Out-of-Domain Retrieval in Dense Encoders [63.28408887247742]
We study whether training procedures can be improved to yield better generalization capabilities in the resulting models. We recommend a simple recipe for training dense encoders: Train on MSMARCO with parameter-efficient methods, such as LoRA, and opt for using in-batch negatives unless given well-constructed hard negatives.
arXiv Detail & Related papers (2023-11-16T10:42:58Z)
An Expert's Guide to Training Physics-informed Neural Networks [5.198985210238479]
Physics-informed neural networks (PINNs) have been popularized as a deep learning framework. PINNs can seamlessly synthesize observational data and partial differential equation (PDE) constraints. We present a series of best practices that can significantly improve the training efficiency and overall accuracy of PINNs.
arXiv Detail & Related papers (2023-08-16T16:19:25Z)
Efficient Online Reinforcement Learning with Offline Data [78.92501185886569]
We show that we can simply apply existing off-policy methods to leverage offline data when learning online. We extensively ablate these design choices, demonstrating the key factors that most affect performance. We see that correct application of these simple recommendations can provide a $mathbf2.5times$ improvement over existing approaches.
arXiv Detail & Related papers (2023-02-06T17:30:22Z)
SURF: Semi-supervised Reward Learning with Data Augmentation for Feedback-efficient Preference-based Reinforcement Learning [168.89470249446023]
We present SURF, a semi-supervised reward learning framework that utilizes a large amount of unlabeled samples with data augmentation. In order to leverage unlabeled samples for reward learning, we infer pseudo-labels of the unlabeled samples based on the confidence of the preference predictor. Our experiments demonstrate that our approach significantly improves the feedback-efficiency of the preference-based method on a variety of locomotion and robotic manipulation tasks.
arXiv Detail & Related papers (2022-03-18T16:50:38Z)
Statistically Efficient Advantage Learning for Offline Reinforcement Learning in Infinite Horizons [16.635744815056906]
We consider reinforcement learning methods in offline domains without additional online data collection, such as mobile health applications. The proposed method takes an optimal Q-estimator computed by any existing state-of-the-art RL algorithms as input, and outputs a new policy whose value is guaranteed to converge at a faster rate than the policy derived based on the initial Q-estimator.
arXiv Detail & Related papers (2022-02-26T15:29:46Z)
Variance-Optimal Augmentation Logging for Counterfactual Evaluation in Contextual Bandits [25.153656462604268]
Methods for offline A/B testing and counterfactual learning are seeing rapid adoption in search and recommender systems. The counterfactual estimators that are commonly used in these methods can have large bias and large variance when the logging policy is very different from the target policy being evaluated. This paper introduces Minimum Variance Augmentation Logging (MVAL), a method for constructing logging policies that minimize the variance of the downstream evaluation or learning problem.
arXiv Detail & Related papers (2022-02-03T17:37:11Z)
Cross Learning in Deep Q-Networks [82.20059754270302]
We propose a novel cross Q-learning algorithm, aim at alleviating the well-known overestimation problem in value-based reinforcement learning methods. Our algorithm builds on double Q-learning, by maintaining a set of parallel models and estimate the Q-value based on a randomly selected network.
arXiv Detail & Related papers (2020-09-29T04:58:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.