Stable Deep Reinforcement Learning via Isotropic Gaussian Representations
- URL: http://arxiv.org/abs/2602.19373v1
- Date: Sun, 22 Feb 2026 22:55:27 GMT
- Title: Stable Deep Reinforcement Learning via Isotropic Gaussian Representations
- Authors: Ali Saheb, Johan Obando-Ceron, Aaron Courville, Pouya Bashivan, Pablo Samuel Castro,
- Abstract summary: We show that isotropic Gaussian embeddings are provably advantageous under non-stationary targets.<n>We propose the use of Sketched Isotropic Gaussian Regularization for shaping representations toward an isotropic Gaussian distribution.
- Score: 19.912439771541568
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep reinforcement learning systems often suffer from unstable training dynamics due to non-stationarity, where learning objectives and data distributions evolve over time. We show that under non-stationary targets, isotropic Gaussian embeddings are provably advantageous. In particular, they induce stable tracking of time-varying targets for linear readouts, achieve maximal entropy under a fixed variance budget, and encourage a balanced use of all representational dimensions--all of which enable agents to be more adaptive and stable. Building on this insight, we propose the use of Sketched Isotropic Gaussian Regularization for shaping representations toward an isotropic Gaussian distribution during training. We demonstrate empirically, over a variety of domains, that this simple and computationally inexpensive method improves performance under non-stationarity while reducing representation collapse, neuron dormancy, and training instability.
Related papers
- Degradation of Feature Space in Continual Learning [2.322400467239964]
We investigate whether promoting feature-space isotropy can enhance representation quality in continual learning.<n>We find that isotropic regularization fails to improve, and can in fact degrade, model accuracy in continual settings.
arXiv Detail & Related papers (2026-02-06T10:26:34Z) - Stability as a Liability:Systematic Breakdown of Linguistic Structure in LLMs [5.96875296117642]
We show that stable parameter trajectories lead stationary solutions to minimize the forward KL divergence to the empirical distribution.<n>We empirically validate this effect using a controlled feedback-based training framework.<n>It indicates that optimization stability and generative expressivity are not inherently aligned, and that stability alone is an insufficient indicator of generative quality.
arXiv Detail & Related papers (2026-01-26T15:34:50Z) - Revisiting Zeroth-Order Optimization: Minimum-Variance Two-Point Estimators and Directionally Aligned Perturbations [57.179679246370114]
We identify the distribution of random perturbations that minimizes the estimator's variance as the perturbation stepsize tends to zero.<n>Our findings reveal that such desired perturbations can align directionally with the true gradient, instead of maintaining a fixed length.
arXiv Detail & Related papers (2025-10-22T19:06:39Z) - Towards Unraveling and Improving Generalization in World Models [29.54936027897926]
This work aims to obtain a deep understanding of robustness and generalization capabilities of world models.<n>We characterize the impact of latent representation errors on robustness and generalization.<n>We propose a Jacobian regularization scheme to mitigate the compounding error propagation effects of non-zero drift.
arXiv Detail & Related papers (2024-12-31T00:15:43Z) - Regularization for Adversarial Robust Learning [18.46110328123008]
We develop a novel approach to adversarial training that integrates $phi$-divergence regularization into the distributionally robust risk function.
This regularization brings a notable improvement in computation compared with the original formulation.
We validate our proposed method in supervised learning, reinforcement learning, and contextual learning and showcase its state-of-the-art performance against various adversarial attacks.
arXiv Detail & Related papers (2024-08-19T03:15:41Z) - Machine learning in and out of equilibrium [58.88325379746631]
Our study uses a Fokker-Planck approach, adapted from statistical physics, to explore these parallels.
We focus in particular on the stationary state of the system in the long-time limit, which in conventional SGD is out of equilibrium.
We propose a new variation of Langevin dynamics (SGLD) that harnesses without replacement minibatching.
arXiv Detail & Related papers (2023-06-06T09:12:49Z) - Stochastic Training is Not Necessary for Generalization [57.04880404584737]
It is widely believed that the implicit regularization of gradient descent (SGD) is fundamental to the impressive generalization behavior we observe in neural networks.
In this work, we demonstrate that non-stochastic full-batch training can achieve strong performance on CIFAR-10 that is on-par with SGD.
arXiv Detail & Related papers (2021-09-29T00:50:00Z) - Eccentric Regularization: Minimizing Hyperspherical Energy without
explicit projection [0.913755431537592]
We introduce a novel regularizing loss function which simulates a pairwise repulsive force between items.
We show that minimizing this loss function in isolation achieves a hyperspherical distribution.
We apply this method of Eccentric Regularization to an autoencoder, and demonstrate its effectiveness in image generation, representation learning and downstream classification tasks.
arXiv Detail & Related papers (2021-04-23T13:55:17Z) - Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task.
This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z) - Training Generative Adversarial Networks by Solving Ordinary
Differential Equations [54.23691425062034]
We study the continuous-time dynamics induced by GAN training.
From this perspective, we hypothesise that instabilities in training GANs arise from the integration error.
We experimentally verify that well-known ODE solvers (such as Runge-Kutta) can stabilise training.
arXiv Detail & Related papers (2020-10-28T15:23:49Z) - Heteroskedastic and Imbalanced Deep Learning with Adaptive
Regularization [55.278153228758434]
Real-world datasets are heteroskedastic and imbalanced.
Addressing heteroskedasticity and imbalance simultaneously is under-explored.
We propose a data-dependent regularization technique for heteroskedastic datasets.
arXiv Detail & Related papers (2020-06-29T01:09:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.