Implicit Under-Parameterization Inhibits Data-Efficient Deep
Reinforcement Learning
- URL: http://arxiv.org/abs/2010.14498v2
- Date: Mon, 25 Oct 2021 03:10:12 GMT
- Title: Implicit Under-Parameterization Inhibits Data-Efficient Deep
Reinforcement Learning
- Authors: Aviral Kumar, Rishabh Agarwal, Dibya Ghosh, Sergey Levine
- Abstract summary: More gradient updates decrease the expressivity of the current value network.
We demonstrate this phenomenon on Atari and Gym benchmarks, in both offline and online RL settings.
- Score: 97.28695683236981
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We identify an implicit under-parameterization phenomenon in value-based deep
RL methods that use bootstrapping: when value functions, approximated using
deep neural networks, are trained with gradient descent using iterated
regression onto target values generated by previous instances of the value
network, more gradient updates decrease the expressivity of the current value
network. We characterize this loss of expressivity via a drop in the rank of
the learned value network features, and show that this typically corresponds to
a performance drop. We demonstrate this phenomenon on Atari and Gym benchmarks,
in both offline and online RL settings. We formally analyze this phenomenon and
show that it results from a pathological interaction between bootstrapping and
gradient-based optimization. We further show that mitigating implicit
under-parameterization by controlling rank collapse can improve performance.
Related papers
- Learning Point Spread Function Invertibility Assessment for Image Deconvolution [14.062542012968313]
We propose a metric that employs a non-linear approach to learn the invertibility of an arbitrary PSF using a neural network.
A lower discrepancy between the mapped PSF and a unit impulse indicates a higher likelihood of successful inversion by a DL network.
arXiv Detail & Related papers (2024-05-25T20:00:27Z) - Stop Regressing: Training Value Functions via Classification for
Scalable Deep RL [109.44370201929246]
We show that training value functions with categorical cross-entropy improves performance and scalability in a variety of domains.
These include: single-task RL on Atari 2600 games with SoftMoEs, multi-task RL on Atari with large-scale ResNets, robotic manipulation with Q-transformers, playing Chess without search, and a language-agent Wordle task with high-capacity Transformers.
arXiv Detail & Related papers (2024-03-06T18:55:47Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Improving Deep Policy Gradients with Value Function Search [21.18135854494779]
This paper focuses on improving value approximation and analyzing the effects on Deep PG primitives.
We introduce a Value Function Search that employs a population of perturbed value networks to search for a better approximation.
Our framework does not require additional environment interactions, gradient computations, or ensembles.
arXiv Detail & Related papers (2023-02-20T18:23:47Z) - Regression as Classification: Influence of Task Formulation on Neural
Network Features [16.239708754973865]
Neural networks can be trained to solve regression problems by using gradient-based methods to minimize the square loss.
practitioners often prefer to reformulate regression as a classification problem, observing that training on the cross entropy loss results in better performance.
By focusing on two-layer ReLU networks, we explore how the implicit bias induced by gradient-based optimization could partly explain the phenomenon.
arXiv Detail & Related papers (2022-11-10T15:13:23Z) - An Empirical Study of Implicit Regularization in Deep Offline RL [44.62587507925864]
We study the relation between effective rank and performance on three offline RL datasets.
We identify three phases of learning that explain the impact of implicit regularization on the learning dynamics.
arXiv Detail & Related papers (2022-07-05T15:07:31Z) - Robust Learning via Persistency of Excitation [4.674053902991301]
We show that network training using gradient descent is equivalent to a dynamical system parameter estimation problem.
We provide an efficient technique for estimating the corresponding Lipschitz constant using extreme value theory.
Our approach also universally increases the adversarial accuracy by 0.1% to 0.3% points in various state-of-the-art adversarially trained models.
arXiv Detail & Related papers (2021-06-03T18:49:05Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z) - Scaling Equilibrium Propagation to Deep ConvNets by Drastically Reducing
its Gradient Estimator Bias [65.13042449121411]
In practice, training a network with the gradient estimates provided by EP does not scale to visual tasks harder than MNIST.
We show that a bias in the gradient estimate of EP, inherent in the use of finite nudging, is responsible for this phenomenon.
We apply these techniques to train an architecture with asymmetric forward and backward connections, yielding a 13.2% test error.
arXiv Detail & Related papers (2020-06-06T09:36:07Z) - The Break-Even Point on Optimization Trajectories of Deep Neural
Networks [64.7563588124004]
We argue for the existence of the "break-even" point on this trajectory.
We show that using a large learning rate in the initial phase of training reduces the variance of the gradient.
We also show that using a low learning rate results in bad conditioning of the loss surface even for a neural network with batch normalization layers.
arXiv Detail & Related papers (2020-02-21T22:55:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.