Related papers: Stabilizing Fixed-Point Iteration for Markov Chain Poisson Equations

Stabilizing Fixed-Point Iteration for Markov Chain Poisson Equations

URL: http://arxiv.org/abs/2602.00474v1
Date: Sat, 31 Jan 2026 02:57:01 GMT
Title: Stabilizing Fixed-Point Iteration for Markov Chain Poisson Equations
Authors: Yang Xu, Vaneet Aggarwal,
Abstract summary: We study finite-state Markov chains with $n$ states and transition matrix $P$.<n>We show that all non-decaying modes are captured by a real peripheral invariant subspace $mathcalK(P)$, and that the induced operator on the quotient space $mathbbRn/mathcalK(P) is strictly contractive, yielding a unique quotient solution.
Score: 49.702772230127465
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Poisson equations underpin average-reward reinforcement learning, but beyond ergodicity they can be ill-posed, meaning that solutions are non-unique and standard fixed point iterations can oscillate on reducible or periodic chains. We study finite-state Markov chains with $n$ states and transition matrix $P$. We show that all non-decaying modes are captured by a real peripheral invariant subspace $\mathcal{K}(P)$, and that the induced operator on the quotient space $\mathbb{R}^n/\mathcal{K}(P)$ is strictly contractive, yielding a unique quotient solution. Building on this viewpoint, we develop an end-to-end pipeline that learns the chain structure, estimates an anchor based gauge map, and runs projected stochastic approximation to estimate a gauge-fixed representative together with an associated peripheral residual. We prove $\widetilde{O}(T^{-1/2})$ convergence up to projection estimation error, enabling stable Poisson equation learning for multichain and periodic regimes with applications to performance evaluation of average-reward reinforcement learning beyond ergodicity.

Related papers

Q-Measure-Learning for Continuous State RL: Efficient Implementation and Convergence [10.189658648290257]
We study reinforcement learning in infinite-horizon discounted Markov decision processes with continuous state spaces.<n>We propose the novel Q-Measure-Learning, which learns a signed empirical measure supported on visited state-action pairs.
arXiv Detail & Related papers (2026-03-03T21:15:22Z)
Optimal Unconstrained Self-Distillation in Ridge Regression: Strict Improvements, Precise Asymptotics, and One-Shot Tuning [61.07540493350384]
Self-distillation (SD) is the process of retraining a student on a mixture of ground-truth and the teacher's own predictions.<n>We show that for any prediction risk, the optimally mixed student improves upon the ridge teacher for every regularization level.<n>We propose a consistent one-shot tuning method to estimate $star$ without grid search, sample splitting, or refitting.
arXiv Detail & Related papers (2026-02-19T17:21:15Z)
Beyond Isotonization: Scalable Non-Crossing Quantile Estimation via Neural Networks for Student Growth Percentiles [0.0]
Student Growth Percentiles (SGPs), widely adopted across U.S. state assessment systems, employ independent quantile regression followed by post-hoc correction.<n>We demonstrate this approach contains a fundamental methodological inconsistency: between independently-estimated, potentially crossed quantiles requires monotonicity.<n>We propose neural network-based multi-quantile regression (NNQR) with shared hidden layers as a practical alternative.
arXiv Detail & Related papers (2025-10-25T19:39:07Z)
Estimating stationary mass, frequency by frequency [11.476508212290275]
We consider the problem of estimating the probability mass placed by the stationary distribution of an exponentially $alpha$-mixing process.<n>We estimate this vector of probabilities in total variation distance, showing universal consistency in $n$.<n>We develop complementary tools -- including concentration inequalities for a natural self-normalized statistic mixing sequences -- that may prove independently useful in the design and analysis of estimators for related problems.
arXiv Detail & Related papers (2025-03-17T04:24:21Z)
Markov Chain Variance Estimation: A Stochastic Approximation Approach [14.883782513177094]
We consider the problem of estimating the variance of a function defined on a Markov chain, an important step for statistical inference of the stationary mean. We design a novel recursive estimator that requires $O(1)$ at each step, does not require any historical samples or any prior knowledge of run-length, and has optimal $O(frac1n) rate of convergence for the mean-squared error (MSE) with provable finite sample guarantees.
arXiv Detail & Related papers (2024-09-09T15:42:28Z)
Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks. In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z)
The Schr\"odinger Bridge between Gaussian Measures has a Closed Form [101.79851806388699]
We focus on the dynamic formulation of OT, also known as the Schr"odinger bridge (SB) problem. In this paper, we provide closed-form expressions for SBs between Gaussian measures.
arXiv Detail & Related papers (2022-02-11T15:59:01Z)
Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process. We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator. We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z)
Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model [0.0]
We analyze the convergence of single-pass, fixed step-size gradient descent on the least-square risk under this model. As a special case, we analyze an online algorithm for estimating a real function on the unit interval from the noiseless observation of its value at randomly sampled points.
arXiv Detail & Related papers (2020-06-15T08:25:50Z)
Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP) We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z)
On Linear Stochastic Approximation: Fine-grained Polyak-Ruppert and Non-Asymptotic Concentration [115.1954841020189]
We study the inequality and non-asymptotic properties of approximation procedures with Polyak-Ruppert averaging. We prove a central limit theorem (CLT) for the averaged iterates with fixed step size and number of iterations going to infinity.
arXiv Detail & Related papers (2020-04-09T17:54:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.