Related papers: The Role of Deep Learning Regularizations on Actors in Offline RL

The Role of Deep Learning Regularizations on Actors in Offline RL

URL: http://arxiv.org/abs/2409.07606v2
Date: Mon, 16 Sep 2024 12:45:07 GMT
Title: The Role of Deep Learning Regularizations on Actors in Offline RL
Authors: Denis Tarasov, Anja Surina, Caglar Gulcehre,
Abstract summary: Regularization techniques, such as dropout, layer normalization, or weight decay, are widely adopted in the construction of modern artificial neural networks. In the domain of Reinforcement Learning (RL), the application of these techniques has been limited. We show that applying standard regularization techniques to actor networks in offline RL actor-critic algorithms yields improvements of 6% on average.
Score: 1.2744523252873352
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep learning regularization techniques, such as dropout, layer normalization, or weight decay, are widely adopted in the construction of modern artificial neural networks, often resulting in more robust training processes and improved generalization capabilities. However, in the domain of Reinforcement Learning (RL), the application of these techniques has been limited, usually applied to value function estimators, and may result in detrimental effects. This issue is even more pronounced in offline RL settings, which bear greater similarity to supervised learning but have received less attention. Recent work in continuous offline RL has demonstrated that while we can build sufficiently powerful critic networks, the generalization of actor networks remains a bottleneck. In this study, we empirically show that applying standard regularization techniques to actor networks in offline RL actor-critic algorithms yields improvements of 6% on average across two algorithms and three different continuous D4RL domains.

Related papers

Is Value Learning Really the Main Bottleneck in Offline RL? [70.54708989409409]
We show that the choice of a policy extraction algorithm significantly affects the performance and scalability of offline RL. We propose two simple test-time policy improvement methods and show that these methods lead to better performance.
arXiv Detail & Related papers (2024-06-13T17:07:49Z)
Exploiting Estimation Bias in Clipped Double Q-Learning for Continous Control Reinforcement Learning Tasks [5.968716050740402]
This paper focuses on addressing and exploiting estimation biases in Actor-Critic methods for continuous control tasks. We design a Bias Exploiting (BE) mechanism to dynamically select the most advantageous estimation bias during training of the RL agent. Most State-of-the-art Deep RL algorithms can be equipped with the BE mechanism, without hindering performance or computational complexity.
arXiv Detail & Related papers (2024-02-14T10:44:03Z)
Efficient Deep Reinforcement Learning Requires Regulating Overfitting [91.88004732618381]
We show that high temporal-difference (TD) error on the validation set of transitions is the main culprit that severely affects the performance of deep RL algorithms. We show that a simple online model selection method that targets the validation TD error is effective across state-based DMC and Gym tasks.
arXiv Detail & Related papers (2023-04-20T17:11:05Z)
Entropy Regularized Reinforcement Learning with Cascading Networks [9.973226671536041]
Deep RL uses neural networks as function approximators. One of the major difficulties of RL is the absence of i.i.d. data. In this work, we challenge the common practices of the (un)supervised learning community of using a fixed neural architecture.
arXiv Detail & Related papers (2022-10-16T10:28:59Z)
Single-Shot Pruning for Offline Reinforcement Learning [47.886329599997474]
Deep Reinforcement Learning (RL) is a powerful framework for solving complex real-world problems. One way to tackle this problem is to prune neural networks leaving only the necessary parameters. We close the gap between RL and single-shot pruning techniques and present a general pruning approach to the Offline RL.
arXiv Detail & Related papers (2021-12-31T18:10:02Z)
DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization [125.5448293005647]
We discuss how the implicit regularization effect of SGD seen in supervised learning could in fact be harmful in the offline deep RL. Our theoretical analysis shows that when existing models of implicit regularization are applied to temporal difference learning, the resulting derived regularizer favors degenerate solutions. We propose a simple and effective explicit regularizer, called DR3, that counteracts the undesirable effects of this implicit regularizer.
arXiv Detail & Related papers (2021-12-09T06:01:01Z)
How to Make Deep RL Work in Practice [15.740760669623876]
Reported results of state-of-the-art algorithms are often difficult to reproduce. We make suggestions which of those techniques to use by default and highlight areas that could benefit from a solution specifically tailored to RL.
arXiv Detail & Related papers (2020-10-25T10:37:54Z)
Critic Regularized Regression [70.8487887738354]
We propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR) We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces.
arXiv Detail & Related papers (2020-06-26T17:50:26Z)
Transient Non-Stationarity and Generalisation in Deep Reinforcement Learning [67.34810824996887]
Non-stationarity can arise in Reinforcement Learning (RL) even in stationary environments. We propose Iterated Relearning (ITER) to improve generalisation of deep RL agents.
arXiv Detail & Related papers (2020-06-10T13:26:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.