An Empirical Study of Implicit Regularization in Deep Offline RL
- URL: http://arxiv.org/abs/2207.02099v2
- Date: Thu, 7 Jul 2022 11:03:23 GMT
- Title: An Empirical Study of Implicit Regularization in Deep Offline RL
- Authors: Caglar Gulcehre, Srivatsan Srinivasan, Jakub Sygnowski, Georg
Ostrovski, Mehrdad Farajtabar, Matt Hoffman, Razvan Pascanu, Arnaud Doucet
- Abstract summary: We study the relation between effective rank and performance on three offline RL datasets.
We identify three phases of learning that explain the impact of implicit regularization on the learning dynamics.
- Score: 44.62587507925864
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep neural networks are the most commonly used function approximators in
offline reinforcement learning. Prior works have shown that neural nets trained
with TD-learning and gradient descent can exhibit implicit regularization that
can be characterized by under-parameterization of these networks. Specifically,
the rank of the penultimate feature layer, also called \textit{effective rank},
has been observed to drastically collapse during the training. In turn, this
collapse has been argued to reduce the model's ability to further adapt in
later stages of learning, leading to the diminished final performance. Such an
association between the effective rank and performance makes effective rank
compelling for offline RL, primarily for offline policy evaluation. In this
work, we conduct a careful empirical study on the relation between effective
rank and performance on three offline RL datasets : bsuite, Atari, and DeepMind
lab. We observe that a direct association exists only in restricted settings
and disappears in the more extensive hyperparameter sweeps. Also, we
empirically identify three phases of learning that explain the impact of
implicit regularization on the learning dynamics and found that bootstrapping
alone is insufficient to explain the collapse of the effective rank. Further,
we show that several other factors could confound the relationship between
effective rank and performance and conclude that studying this association
under simplistic assumptions could be highly misleading.
Related papers
- Can Active Sampling Reduce Causal Confusion in Offline Reinforcement
Learning? [58.942118128503104]
Causal confusion is a phenomenon where an agent learns a policy that reflects imperfect spurious correlations in the data.
This phenomenon is particularly pronounced in domains such as robotics.
In this paper, we study causal confusion in offline reinforcement learning.
arXiv Detail & Related papers (2023-12-28T17:54:56Z) - Understanding, Predicting and Better Resolving Q-Value Divergence in
Offline-RL [86.0987896274354]
We first identify a fundamental pattern, self-excitation, as the primary cause of Q-value estimation divergence in offline RL.
We then propose a novel Self-Excite Eigenvalue Measure (SEEM) metric to measure the evolving property of Q-network at training.
For the first time, our theory can reliably decide whether the training will diverge at an early stage.
arXiv Detail & Related papers (2023-10-06T17:57:44Z) - Recurrent Hypernetworks are Surprisingly Strong in Meta-RL [37.80510757630612]
Deep reinforcement learning (RL) is notoriously impractical to deploy due to sample inefficiency.
Meta-RL directly addresses this sample inefficiency by learning to perform few-shot learning when a distribution of related tasks is available for meta-training.
Recent work suggests end-to-end learning in conjunction with an off-the-shelf sequential model, such as a recurrent network, is a surprisingly strong baseline.
arXiv Detail & Related papers (2023-09-26T14:42:28Z) - DR3: Value-Based Deep Reinforcement Learning Requires Explicit
Regularization [125.5448293005647]
We discuss how the implicit regularization effect of SGD seen in supervised learning could in fact be harmful in the offline deep RL.
Our theoretical analysis shows that when existing models of implicit regularization are applied to temporal difference learning, the resulting derived regularizer favors degenerate solutions.
We propose a simple and effective explicit regularizer, called DR3, that counteracts the undesirable effects of this implicit regularizer.
arXiv Detail & Related papers (2021-12-09T06:01:01Z) - Offline Reinforcement Learning with Value-based Episodic Memory [19.12430651038357]
offline reinforcement learning (RL) shows promise of applying RL to real-world problems.
We propose Expectile V-Learning (EVL), which smoothly interpolates between the optimal value learning and behavior cloning.
We present a new offline method called Value-based Episodic Memory (VEM)
arXiv Detail & Related papers (2021-10-19T08:20:11Z) - The Impact of Activation Sparsity on Overfitting in Convolutional Neural
Networks [1.9424280683610138]
Overfitting is one of the fundamental challenges when training convolutional neural networks.
In this study we introduce a perplexity-based sparsity definition to derive and visualise layer-wise activation measures.
arXiv Detail & Related papers (2021-04-13T12:55:37Z) - Implicit Under-Parameterization Inhibits Data-Efficient Deep
Reinforcement Learning [97.28695683236981]
More gradient updates decrease the expressivity of the current value network.
We demonstrate this phenomenon on Atari and Gym benchmarks, in both offline and online RL settings.
arXiv Detail & Related papers (2020-10-27T17:55:16Z) - OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement
Learning [107.6943868812716]
In many practical applications, the situation is reversed: an agent may have access to large amounts of undirected offline experience data, while access to the online environment is severely limited.
Our main insight is that, when presented with offline data composed of a variety of behaviors, an effective way to leverage this data is to extract a continuous space of recurring and temporally extended primitive behaviors.
In addition to benefiting offline policy optimization, we show that performing offline primitive learning in this way can also be leveraged for improving few-shot imitation learning.
arXiv Detail & Related papers (2020-10-26T14:31:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.