Importance of using appropriate baselines for evaluation of
data-efficiency in deep reinforcement learning for Atari
- URL: http://arxiv.org/abs/2003.10181v2
- Date: Tue, 31 Mar 2020 17:00:42 GMT
- Title: Importance of using appropriate baselines for evaluation of
data-efficiency in deep reinforcement learning for Atari
- Authors: Kacper Kielak
- Abstract summary: We show that the actual improvement in the efficiency came from allowing the algorithm for more training updates for each data sample.
We argue that the agent similar to the modified DQN that is presented in this paper should be used as a baseline for any future work aimed at improving sample efficiency of deep reinforcement learning.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) has seen great advancements in the past few
years. Nevertheless, the consensus among the RL community is that currently
used methods, despite all their benefits, suffer from extreme data
inefficiency, especially in the rich visual domains like Atari. To circumvent
this problem, novel approaches were introduced that often claim to be much more
efficient than popular variations of the state-of-the-art DQN algorithm. In
this paper, however, we demonstrate that the newly proposed techniques simply
used unfair baselines in their experiments. Namely, we show that the actual
improvement in the efficiency came from allowing the algorithm for more
training updates for each data sample, and not from employing the new methods.
By allowing DQN to execute network updates more frequently we manage to reach
similar or better results than the recently proposed advancement, often at a
fraction of complexity and computational costs. Furthermore, based on the
outcomes of the study, we argue that the agent similar to the modified DQN that
is presented in this paper should be used as a baseline for any future work
aimed at improving sample efficiency of deep reinforcement learning.
Related papers
- Knowledge Editing in Language Models via Adapted Direct Preference Optimization [50.616875565173274]
Large Language Models (LLMs) can become outdated over time.
Knowledge Editing aims to overcome this challenge using weight updates that do not require expensive retraining.
arXiv Detail & Related papers (2024-06-14T11:02:21Z) - Simple Ingredients for Offline Reinforcement Learning [86.1988266277766]
offline reinforcement learning algorithms have proven effective on datasets highly connected to the target downstream task.
We show that existing methods struggle with diverse data: their performance considerably deteriorates as data collected for related but different tasks is simply added to the offline buffer.
We show that scale, more than algorithmic considerations, is the key factor influencing performance.
arXiv Detail & Related papers (2024-03-19T18:57:53Z) - Revisiting Data Augmentation in Deep Reinforcement Learning [3.660182910533372]
Various data augmentation techniques have been recently proposed in image-based deep reinforcement learning (DRL)
We analyze existing methods to better understand them and to uncover how they are connected.
This analysis suggests recommendations on how to exploit data augmentation in a more principled way.
arXiv Detail & Related papers (2024-02-19T14:42:10Z) - Learning Diverse Policies with Soft Self-Generated Guidance [2.9602904918952695]
Reinforcement learning with sparse and deceptive rewards is challenging because non-zero rewards are rarely obtained.
This paper develops an approach that uses diverse past trajectories for faster and more efficient online RL.
arXiv Detail & Related papers (2024-02-07T02:53:50Z) - Back to Basics: A Simple Recipe for Improving Out-of-Domain Retrieval in
Dense Encoders [63.28408887247742]
We study whether training procedures can be improved to yield better generalization capabilities in the resulting models.
We recommend a simple recipe for training dense encoders: Train on MSMARCO with parameter-efficient methods, such as LoRA, and opt for using in-batch negatives unless given well-constructed hard negatives.
arXiv Detail & Related papers (2023-11-16T10:42:58Z) - An Expert's Guide to Training Physics-informed Neural Networks [5.198985210238479]
Physics-informed neural networks (PINNs) have been popularized as a deep learning framework.
PINNs can seamlessly synthesize observational data and partial differential equation (PDE) constraints.
We present a series of best practices that can significantly improve the training efficiency and overall accuracy of PINNs.
arXiv Detail & Related papers (2023-08-16T16:19:25Z) - Efficient Online Reinforcement Learning with Offline Data [78.92501185886569]
We show that we can simply apply existing off-policy methods to leverage offline data when learning online.
We extensively ablate these design choices, demonstrating the key factors that most affect performance.
We see that correct application of these simple recommendations can provide a $mathbf2.5times$ improvement over existing approaches.
arXiv Detail & Related papers (2023-02-06T17:30:22Z) - SURF: Semi-supervised Reward Learning with Data Augmentation for
Feedback-efficient Preference-based Reinforcement Learning [168.89470249446023]
We present SURF, a semi-supervised reward learning framework that utilizes a large amount of unlabeled samples with data augmentation.
In order to leverage unlabeled samples for reward learning, we infer pseudo-labels of the unlabeled samples based on the confidence of the preference predictor.
Our experiments demonstrate that our approach significantly improves the feedback-efficiency of the preference-based method on a variety of locomotion and robotic manipulation tasks.
arXiv Detail & Related papers (2022-03-18T16:50:38Z) - Statistically Efficient Advantage Learning for Offline Reinforcement
Learning in Infinite Horizons [16.635744815056906]
We consider reinforcement learning methods in offline domains without additional online data collection, such as mobile health applications.
The proposed method takes an optimal Q-estimator computed by any existing state-of-the-art RL algorithms as input, and outputs a new policy whose value is guaranteed to converge at a faster rate than the policy derived based on the initial Q-estimator.
arXiv Detail & Related papers (2022-02-26T15:29:46Z) - Variance-Optimal Augmentation Logging for Counterfactual Evaluation in
Contextual Bandits [25.153656462604268]
Methods for offline A/B testing and counterfactual learning are seeing rapid adoption in search and recommender systems.
The counterfactual estimators that are commonly used in these methods can have large bias and large variance when the logging policy is very different from the target policy being evaluated.
This paper introduces Minimum Variance Augmentation Logging (MVAL), a method for constructing logging policies that minimize the variance of the downstream evaluation or learning problem.
arXiv Detail & Related papers (2022-02-03T17:37:11Z) - Cross Learning in Deep Q-Networks [82.20059754270302]
We propose a novel cross Q-learning algorithm, aim at alleviating the well-known overestimation problem in value-based reinforcement learning methods.
Our algorithm builds on double Q-learning, by maintaining a set of parallel models and estimate the Q-value based on a randomly selected network.
arXiv Detail & Related papers (2020-09-29T04:58:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.