On The Presence of Double-Descent in Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2511.06895v1
- Date: Mon, 10 Nov 2025 09:45:03 GMT
- Title: On The Presence of Double-Descent in Deep Reinforcement Learning
- Authors: Viktor VeselĂ˝, Aleksandar Todorov, Matthia Sabatelli,
- Abstract summary: Double descent (DD) paradox remains largely unexplored in the non-stationary domain of Deep Reinforcement Learning (DRL)<n>We present preliminary evidence that DD exists in model-free DRL, investigating it systematically across varying model capacity using the Actor-Critic framework.<n>These findings establish DD as a factor in DRL and provide an information-based mechanism for designing agents that are more general, transferable, and robust.
- Score: 43.22339935902436
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The double descent (DD) paradox, where over-parameterized models see generalization improve past the interpolation point, remains largely unexplored in the non-stationary domain of Deep Reinforcement Learning (DRL). We present preliminary evidence that DD exists in model-free DRL, investigating it systematically across varying model capacity using the Actor-Critic framework. We rely on an information-theoretic metric, Policy Entropy, to measure policy uncertainty throughout training. Preliminary results show a clear epoch-wise DD curve; the policy's entrance into the second descent region correlates with a sustained, significant reduction in Policy Entropy. This entropic decay suggests that over-parameterization acts as an implicit regularizer, guiding the policy towards robust, flatter minima in the loss landscape. These findings establish DD as a factor in DRL and provide an information-based mechanism for designing agents that are more general, transferable, and robust.
Related papers
- Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning [88.42566960813438]
CalibRL is a hybrid-policy RLVR framework that supports controllable exploration with expert guidance.<n>CalibRL increases policy entropy in a guided manner and clarifies the target distribution.<n>Experiments across eight benchmarks, including both in-domain and out-of-domain settings, demonstrate consistent improvements.
arXiv Detail & Related papers (2026-02-22T07:23:36Z) - Analyzing and Improving Diffusion Models for Time-Series Data Imputation: A Proximal Recursion Perspective [45.713195454899875]
Diffusion models (DMs) have shown promise for Time-Series Data Imputation.<n>DMs' performance remains inconsistent in complex scenarios.<n>We propose a novel framework called SPIRIT (Semi-Proximal Transport Regularized time-series Imputation)
arXiv Detail & Related papers (2026-02-01T12:11:57Z) - Learning Causality for Longitudinal Data [1.2691047660244335]
This thesis develops methods for causal inference and causal representation learning in high-dimensional, time-varying data.<n>The first contribution introduces the Causal Dynamic Variational Autoencoder (CDVAE), a model for estimating Individual Treatment Effects (ITEs)<n>The second contribution proposes an efficient framework for long-term counterfactual regression based on RNNs enhanced with Contrastive Predictive Coding ( CPC) and InfoMax.<n>The third contribution advances CRL by addressing how latent causes manifest in observed variables.
arXiv Detail & Related papers (2025-12-04T16:51:49Z) - Rediscovering Entropy Regularization: Adaptive Coefficient Unlocks Its Potential for LLM Reinforcement Learning [55.59724323303857]
We propose a framework that balances exploration and exploitation via three components: difficulty-aware coefficient allocation, initial-anchored target entropy, and dynamic global coefficient adjustment.<n>Experiments on multiple mathematical reasoning benchmarks show that AER consistently outperforms baselines, improving both reasoning accuracy and exploration capability.
arXiv Detail & Related papers (2025-10-13T03:10:26Z) - ResAD: Normalized Residual Trajectory Modeling for End-to-End Autonomous Driving [64.42138266293202]
ResAD is a Normalized Residual Trajectory Modeling framework.<n>It reframes the learning task to predict the residual deviation from an inertial reference.<n>On the NAVSIM benchmark, ResAD achieves a state-of-the-art PDMS of 88.6 using a vanilla diffusion policy.
arXiv Detail & Related papers (2025-10-09T17:59:36Z) - Control-Augmented Autoregressive Diffusion for Data Assimilation [17.305296093966803]
We introduce an amortized framework that augments pretrained ARDMs with a lightweight controller.<n>We evaluate this framework in the context of data assimilation (DA) for chaotic partial differential equations (PDEs)<n>Our approach reduces DA inference to a single forward rollout with on-the-fly corrections, avoiding expensive adjoint computations and/or optimizations during inference.
arXiv Detail & Related papers (2025-10-08T04:37:32Z) - TD-JEPA: Latent-predictive Representations for Zero-Shot Reinforcement Learning [63.73629127832652]
We introduce TD-JEPA, which leverages TD-based latent-predictive representations into unsupervised RL.<n> TD-JEPA trains explicit state and task encoders, a policy-conditioned multi-step predictor, and a set of parameterized policies directly in latent space.<n> Empirically, TD-JEPA matches or outperforms state-of-the-art baselines on locomotion, navigation, and manipulation tasks across 13 datasets.
arXiv Detail & Related papers (2025-10-01T10:21:18Z) - STITCH-OPE: Trajectory Stitching with Guided Diffusion for Off-Policy Evaluation [18.55356623615343]
Off-policy evaluation (OPE) estimates the performance of a target policy using offline data collected from a behavior policy.<n>Existing OPE methods are ineffective for high-dimensional, long-horizon problems.<n>We propose STITCH-OPE, a model-based generative framework that leverages denoising diffusion for long-horizon OPE.
arXiv Detail & Related papers (2025-05-27T06:39:26Z) - datadriftR: An R Package for Concept Drift Detection in Predictive Models [0.0]
This paper introduces drifter, an R package designed to detect concept drift.<n>It proposes a novel method called Profile Drift Detection (PDD) that enables both drift detection and an enhanced understanding of the cause behind the drift.
arXiv Detail & Related papers (2024-12-15T20:59:49Z) - Analyzing Generalization in Policy Networks: A Case Study with the
Double-Integrator System [13.012569626941062]
This paper uses a novel analysis technique known as state division to uncover the underlying factors contributing to performance deterioration.
We show that the expansion of state space induces the activation function $tanh$ to exhibit saturability, resulting in the transformation of the state division boundary from nonlinear to linear.
arXiv Detail & Related papers (2023-12-16T15:06:29Z) - Simple and Effective Prevention of Mode Collapse in Deep One-Class
Classification [93.2334223970488]
We propose two regularizers to prevent hypersphere collapse in deep SVDD.
The first regularizer is based on injecting random noise via the standard cross-entropy loss.
The second regularizer penalizes the minibatch variance when it becomes too small.
arXiv Detail & Related papers (2020-01-24T03:44:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.