Related papers: On The Presence of Double-Descent in Deep Reinforcement Learning

On The Presence of Double-Descent in Deep Reinforcement Learning

URL: http://arxiv.org/abs/2511.06895v1
Date: Mon, 10 Nov 2025 09:45:03 GMT
Title: On The Presence of Double-Descent in Deep Reinforcement Learning
Authors: Viktor Veselý, Aleksandar Todorov, Matthia Sabatelli,
Abstract summary: Double descent (DD) paradox remains largely unexplored in the non-stationary domain of Deep Reinforcement Learning (DRL)<n>We present preliminary evidence that DD exists in model-free DRL, investigating it systematically across varying model capacity using the Actor-Critic framework.<n>These findings establish DD as a factor in DRL and provide an information-based mechanism for designing agents that are more general, transferable, and robust.
Score: 43.22339935902436
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The double descent (DD) paradox, where over-parameterized models see generalization improve past the interpolation point, remains largely unexplored in the non-stationary domain of Deep Reinforcement Learning (DRL). We present preliminary evidence that DD exists in model-free DRL, investigating it systematically across varying model capacity using the Actor-Critic framework. We rely on an information-theoretic metric, Policy Entropy, to measure policy uncertainty throughout training. Preliminary results show a clear epoch-wise DD curve; the policy's entrance into the second descent region correlates with a sustained, significant reduction in Policy Entropy. This entropic decay suggests that over-parameterization acts as an implicit regularizer, guiding the policy towards robust, flatter minima in the loss landscape. These findings establish DD as a factor in DRL and provide an information-based mechanism for designing agents that are more general, transferable, and robust.

Related papers

Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning [88.42566960813438]
CalibRL is a hybrid-policy RLVR framework that supports controllable exploration with expert guidance.<n>CalibRL increases policy entropy in a guided manner and clarifies the target distribution.<n>Experiments across eight benchmarks, including both in-domain and out-of-domain settings, demonstrate consistent improvements.
arXiv Detail & Related papers (2026-02-22T07:23:36Z)
Analyzing and Improving Diffusion Models for Time-Series Data Imputation: A Proximal Recursion Perspective [45.713195454899875]
Diffusion models (DMs) have shown promise for Time-Series Data Imputation.<n>DMs' performance remains inconsistent in complex scenarios.<n>We propose a novel framework called SPIRIT (Semi-Proximal Transport Regularized time-series Imputation)
arXiv Detail & Related papers (2026-02-01T12:11:57Z)
Learning Causality for Longitudinal Data [1.2691047660244335]
This thesis develops methods for causal inference and causal representation learning in high-dimensional, time-varying data.<n>The first contribution introduces the Causal Dynamic Variational Autoencoder (CDVAE), a model for estimating Individual Treatment Effects (ITEs)<n>The second contribution proposes an efficient framework for long-term counterfactual regression based on RNNs enhanced with Contrastive Predictive Coding ( CPC) and InfoMax.<n>The third contribution advances CRL by addressing how latent causes manifest in observed variables.
arXiv Detail & Related papers (2025-12-04T16:51:49Z)
Rediscovering Entropy Regularization: Adaptive Coefficient Unlocks Its Potential for LLM Reinforcement Learning [55.59724323303857]
We propose a framework that balances exploration and exploitation via three components: difficulty-aware coefficient allocation, initial-anchored target entropy, and dynamic global coefficient adjustment.<n>Experiments on multiple mathematical reasoning benchmarks show that AER consistently outperforms baselines, improving both reasoning accuracy and exploration capability.
arXiv Detail & Related papers (2025-10-13T03:10:26Z)
ResAD: Normalized Residual Trajectory Modeling for End-to-End Autonomous Driving [64.42138266293202]
ResAD is a Normalized Residual Trajectory Modeling framework.<n>It reframes the learning task to predict the residual deviation from an inertial reference.<n>On the NAVSIM benchmark, ResAD achieves a state-of-the-art PDMS of 88.6 using a vanilla diffusion policy.
arXiv Detail & Related papers (2025-10-09T17:59:36Z)
Control-Augmented Autoregressive Diffusion for Data Assimilation [17.305296093966803]
We introduce an amortized framework that augments pretrained ARDMs with a lightweight controller.<n>We evaluate this framework in the context of data assimilation (DA) for chaotic partial differential equations (PDEs)<n>Our approach reduces DA inference to a single forward rollout with on-the-fly corrections, avoiding expensive adjoint computations and/or optimizations during inference.
arXiv Detail & Related papers (2025-10-08T04:37:32Z)
TD-JEPA: Latent-predictive Representations for Zero-Shot Reinforcement Learning [63.73629127832652]
We introduce TD-JEPA, which leverages TD-based latent-predictive representations into unsupervised RL.<n> TD-JEPA trains explicit state and task encoders, a policy-conditioned multi-step predictor, and a set of parameterized policies directly in latent space.<n> Empirically, TD-JEPA matches or outperforms state-of-the-art baselines on locomotion, navigation, and manipulation tasks across 13 datasets.
arXiv Detail & Related papers (2025-10-01T10:21:18Z)
STITCH-OPE: Trajectory Stitching with Guided Diffusion for Off-Policy Evaluation [18.55356623615343]
Off-policy evaluation (OPE) estimates the performance of a target policy using offline data collected from a behavior policy.<n>Existing OPE methods are ineffective for high-dimensional, long-horizon problems.<n>We propose STITCH-OPE, a model-based generative framework that leverages denoising diffusion for long-horizon OPE.
arXiv Detail & Related papers (2025-05-27T06:39:26Z)
datadriftR: An R Package for Concept Drift Detection in Predictive Models [0.0]
This paper introduces drifter, an R package designed to detect concept drift.<n>It proposes a novel method called Profile Drift Detection (PDD) that enables both drift detection and an enhanced understanding of the cause behind the drift.
arXiv Detail & Related papers (2024-12-15T20:59:49Z)
Analyzing Generalization in Policy Networks: A Case Study with the Double-Integrator System [13.012569626941062]
This paper uses a novel analysis technique known as state division to uncover the underlying factors contributing to performance deterioration. We show that the expansion of state space induces the activation function $tanh$ to exhibit saturability, resulting in the transformation of the state division boundary from nonlinear to linear.
arXiv Detail & Related papers (2023-12-16T15:06:29Z)
Simple and Effective Prevention of Mode Collapse in Deep One-Class Classification [93.2334223970488]
We propose two regularizers to prevent hypersphere collapse in deep SVDD. The first regularizer is based on injecting random noise via the standard cross-entropy loss. The second regularizer penalizes the minibatch variance when it becomes too small.
arXiv Detail & Related papers (2020-01-24T03:44:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.