A Cram\'er Distance perspective on Non-crossing Quantile Regression in
Distributional Reinforcement Learning
- URL: http://arxiv.org/abs/2110.00535v1
- Date: Fri, 1 Oct 2021 17:00:25 GMT
- Title: A Cram\'er Distance perspective on Non-crossing Quantile Regression in
Distributional Reinforcement Learning
- Authors: Alix Lh\'eritier and Nicolas Bondoux
- Abstract summary: Quantile-based methods like QR-DQN project arbitrary distributions onto a parametric subset of staircase distributions.
We show that monotonicity constraints on the quantiles have been shown to improve the performance of QR-DQN for uncertainty-based exploration strategies.
We propose a novel non-crossing neural architecture that allows a good training performance using a novel algorithm to compute the Cram'er distance.
- Score: 2.28438857884398
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Distributional reinforcement learning (DRL) extends the value-based approach
by using a deep convolutional network to approximate the full distribution over
future returns instead of the mean only, providing a richer signal that leads
to improved performances. Quantile-based methods like QR-DQN project arbitrary
distributions onto a parametric subset of staircase distributions by minimizing
the 1-Wasserstein distance, however, due to biases in the gradients, the
quantile regression loss is used instead for training, guaranteeing the same
minimizer and enjoying unbiased gradients. Recently, monotonicity constraints
on the quantiles have been shown to improve the performance of QR-DQN for
uncertainty-based exploration strategies. The contribution of this work is in
the setting of fixed quantile levels and is twofold. First, we prove that the
Cram\'er distance yields a projection that coincides with the 1-Wasserstein one
and that, under monotonicity constraints, the squared Cram\'er and the quantile
regression losses yield collinear gradients, shedding light on the connection
between these important elements of DRL. Second, we propose a novel
non-crossing neural architecture that allows a good training performance using
a novel algorithm to compute the Cram\'er distance, yielding significant
improvements over QR-DQN in a number of games of the standard Atari 2600
benchmark.
Related papers
- A Stein Gradient Descent Approach for Doubly Intractable Distributions [5.63014864822787]
We propose a novel Monte Carlo Stein variational gradient descent (MC-SVGD) approach for inference for doubly intractable distributions.
The proposed method achieves substantial computational gains over existing algorithms, while providing comparable inferential performance for the posterior distributions.
arXiv Detail & Related papers (2024-10-28T13:42:27Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Robust Stochastic Optimization via Gradient Quantile Clipping [6.2844649973308835]
We introduce a quant clipping strategy for Gradient Descent (SGD)
We use gradient new outliers as norm clipping chains.
We propose an implementation of the algorithm using Huberiles.
arXiv Detail & Related papers (2023-09-29T15:24:48Z) - Scaling Forward Gradient With Local Losses [117.22685584919756]
Forward learning is a biologically plausible alternative to backprop for learning deep neural networks.
We show that it is possible to substantially reduce the variance of the forward gradient by applying perturbations to activations rather than weights.
Our approach matches backprop on MNIST and CIFAR-10 and significantly outperforms previously proposed backprop-free algorithms on ImageNet.
arXiv Detail & Related papers (2022-10-07T03:52:27Z) - Federated Optimization Algorithms with Random Reshuffling and Gradient
Compression [2.7554288121906296]
We provide the first analysis of methods with gradient compression and without-replacement sampling.
We show how to reduce the variance coming from gradient quantization through the use of control iterates.
We outline several settings in which they improve upon existing algorithms.
arXiv Detail & Related papers (2022-06-14T17:36:47Z) - Error-Correcting Neural Networks for Two-Dimensional Curvature
Computation in the Level-Set Method [0.0]
We present an error-neural-modeling-based strategy for approximating two-dimensional curvature in the level-set method.
Our main contribution is a redesigned hybrid solver that relies on numerical schemes to enable machine-learning operations on demand.
arXiv Detail & Related papers (2022-01-22T05:14:40Z) - Communication-Efficient Federated Learning via Quantized Compressed
Sensing [82.10695943017907]
The presented framework consists of gradient compression for wireless devices and gradient reconstruction for a parameter server.
Thanks to gradient sparsification and quantization, our strategy can achieve a higher compression ratio than one-bit gradient compression.
We demonstrate that the framework achieves almost identical performance with the case that performs no compression.
arXiv Detail & Related papers (2021-11-30T02:13:54Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Probabilistic partition of unity networks: clustering based deep
approximation [0.0]
Partition of unity networks (POU-Nets) have been shown capable of realizing algebraic convergence rates for regression and solution of PDEs.
We enrich POU-Nets with a Gaussian noise model to obtain a probabilistic generalization amenable to gradient-based generalizations of a maximum likelihood loss.
We provide benchmarks quantifying performance in high/low-dimensions, demonstrating that convergence rates depend only on the latent dimension of data within high-dimensional space.
arXiv Detail & Related papers (2021-07-07T08:02:00Z) - Cogradient Descent for Dependable Learning [64.02052988844301]
We propose a dependable learning based on Cogradient Descent (CoGD) algorithm to address the bilinear optimization problem.
CoGD is introduced to solve bilinear problems when one variable is with sparsity constraint.
It can also be used to decompose the association of features and weights, which further generalizes our method to better train convolutional neural networks (CNNs)
arXiv Detail & Related papers (2021-06-20T04:28:20Z) - Variance Reduction for Deep Q-Learning using Stochastic Recursive
Gradient [51.880464915253924]
Deep Q-learning algorithms often suffer from poor gradient estimations with an excessive variance.
This paper introduces the framework for updating the gradient estimates in deep Q-learning, achieving a novel algorithm called SRG-DQN.
arXiv Detail & Related papers (2020-07-25T00:54:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.