Related papers: Stationary Reweighting Yields Local Convergence of Soft Fitted Q-Iteration

Stationary Reweighting Yields Local Convergence of Soft Fitted Q-Iteration

URL: http://arxiv.org/abs/2512.23927v1
Date: Tue, 30 Dec 2025 00:58:35 GMT
Title: Stationary Reweighting Yields Local Convergence of Soft Fitted Q-Iteration
Authors: Lars van der Laan, Nathan Kallus,
Abstract summary: We show that fitted Q-iteration and its entropy-regularized variant, soft FQI, behave poorly under function approximation and distribution shift.<n>We introduce stationary-reweighted soft FQI, which reweights each regression update using the stationary distribution of the current policy.<n>Our analysis suggests that global convergence may be recovered by gradually reducing the softmax temperature.
Score: 40.322273308230606
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Fitted Q-iteration (FQI) and its entropy-regularized variant, soft FQI, are central tools for value-based model-free offline reinforcement learning, but can behave poorly under function approximation and distribution shift. In the entropy-regularized setting, we show that the soft Bellman operator is locally contractive in the stationary norm of the soft-optimal policy, rather than in the behavior norm used by standard FQI. This geometric mismatch explains the instability of soft Q-iteration with function approximation in the absence of Bellman completeness. To restore contraction, we introduce stationary-reweighted soft FQI, which reweights each regression update using the stationary distribution of the current policy. We prove local linear convergence under function approximation with geometrically damped weight-estimation errors, assuming approximate realizability. Our analysis further suggests that global convergence may be recovered by gradually reducing the softmax temperature, and that this continuation approach can extend to the hardmax limit under a mild margin condition.

Related papers

Stability and Generalization of Push-Sum Based Decentralized Optimization over Directed Graphs [55.77845440440496]
Push-based decentralized communication enables optimization over communication networks, where information exchange may be asymmetric.<n>We develop a unified uniform-stability framework for the Gradient Push (SGP) algorithm.<n>A key technical ingredient is an imbalance-aware generalization bound through two quantities.
arXiv Detail & Related papers (2026-02-24T05:32:03Z)
Renet: Principled and Efficient Relaxation for the Elastic Net via Dynamic Objective Selection [0.0]
We introduce Renet, a principled generalization of the Relaxed Lasso to the Elastic Net family of estimators.<n>We show that Renet consistently outperforms the standard Elastic Net in high-dimensional, low signal-to-noise ratio and high-multicolline regimes.
arXiv Detail & Related papers (2026-02-11T18:22:59Z)
Equilibrium Propagation Without Limits [0.0]
We prove that the gradient of the difference in Helmholtz free energy between a nudged and free phase is exactly the difference in expected local energy derivatives.<n>This validates the classic Contrastive Hebbian Learning update as an exact gradient estimator for arbitrary finite nudging.
arXiv Detail & Related papers (2025-11-27T01:55:26Z)
An Empirical Risk Minimization Approach for Offline Inverse RL and Dynamic Discrete Choice Model [8.95720650633184]
We study the problem of estimating Dynamic Choice (DDC) models, also known as offline Maximum Entropy-Regularized Inverse Reinforcement Learning ( offline MaxEnt-IRL) in machine learning.<n>The objective is to recover reward or $Q*$ functions that govern agent behavior from offline behavior data.<n>We propose a globally convergent gradient-based method for solving these problems without the restrictive assumption of linearly parameterized rewards.
arXiv Detail & Related papers (2025-02-19T22:22:20Z)
Taming Nonconvex Stochastic Mirror Descent with General Bregman Divergence [25.717501580080846]
This paper revisits the convergence of gradient Forward Mirror (SMD) in the contemporary non optimization setting. For the training, we develop provably convergent algorithms for the problem of linear networks.
arXiv Detail & Related papers (2024-02-27T17:56:49Z)
Distributionally Time-Varying Online Stochastic Optimization under Polyak-{\L}ojasiewicz Condition with Application in Conditional Value-at-Risk Statistical Learning [9.749745086213215]
We consider a sequence of optimization problems following a time-varying distribution via the lens of online optimization. We show that the framework can be applied to the Conditional Value-at-Risk (CVaR) learning problem.
arXiv Detail & Related papers (2023-09-18T00:47:08Z)
Benign overfitting and adaptive nonparametric regression [71.70323672531606]
We construct an estimator which is a continuous function interpolating the data points with high probability. We attain minimax optimal rates under mean squared risk on the scale of H"older classes adaptively to the unknown smoothness.
arXiv Detail & Related papers (2022-06-27T14:50:14Z)
On the Convergence of Stochastic Extragradient for Bilinear Games with Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence. We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z)
Linear Convergence of Entropy-Regularized Natural Policy Gradient with Linear Function Approximation [30.02577720946978]
We establish finite-time convergence analyses of entropy-regularized NPG with linear function approximation. We prove that entropy-regularized NPG exhibits emphlinear convergence up to a function approximation error.
arXiv Detail & Related papers (2021-06-08T04:30:39Z)
Robust Implicit Networks via Non-Euclidean Contractions [63.91638306025768]
Implicit neural networks show improved accuracy and significant reduction in memory consumption. They can suffer from ill-posedness and convergence instability. This paper provides a new framework to design well-posed and robust implicit neural networks.
arXiv Detail & Related papers (2021-06-06T18:05:02Z)
Gaussian Process-based Min-norm Stabilizing Controller for Control-Affine Systems with Uncertain Input Effects and Dynamics [90.81186513537777]
We propose a novel compound kernel that captures the control-affine nature of the problem. We show that this resulting optimization problem is convex, and we call it Gaussian Process-based Control Lyapunov Function Second-Order Cone Program (GP-CLF-SOCP)
arXiv Detail & Related papers (2020-11-14T01:27:32Z)
Fine-Grained Analysis of Stability and Generalization for Stochastic Gradient Descent [55.85456985750134]
We introduce a new stability measure called on-average model stability, for which we develop novel bounds controlled by the risks of SGD iterates. This yields generalization bounds depending on the behavior of the best model, and leads to the first-ever-known fast bounds in the low-noise setting. To our best knowledge, this gives the firstever-known stability and generalization for SGD with even non-differentiable loss functions.
arXiv Detail & Related papers (2020-06-15T06:30:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.