Conditional Score Learning for Quickest Change Detection in Markov Transition Kernels
- URL: http://arxiv.org/abs/2511.03953v1
- Date: Thu, 06 Nov 2025 01:07:36 GMT
- Title: Conditional Score Learning for Quickest Change Detection in Markov Transition Kernels
- Authors: Wuxia Chen, Taposh Banerjee, Vahid Tarokh,
- Abstract summary: We learn the conditional score $nabla_mathbfy log p(mathbfy|mathbfx)$ directly from sample pairs.<n>We develop a score-based CUSUM procedure that uses conditional Hyvarinen score differences to detect changes in the kernel.
- Score: 15.70380155219054
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We address the problem of quickest change detection in Markov processes with unknown transition kernels. The key idea is to learn the conditional score $\nabla_{\mathbf{y}} \log p(\mathbf{y}|\mathbf{x})$ directly from sample pairs $( \mathbf{x},\mathbf{y})$, where both $\mathbf{x}$ and $\mathbf{y}$ are high-dimensional data generated by the same transition kernel. In this way, we avoid explicit likelihood evaluation and provide a practical way to learn the transition dynamics. Based on this estimation, we develop a score-based CUSUM procedure that uses conditional Hyvarinen score differences to detect changes in the kernel. To ensure bounded increments, we propose a truncated version of the statistic. With Hoeffding's inequality for uniformly ergodic Markov processes, we prove exponential lower bounds on the mean time to false alarm. We also prove asymptotic upper bounds on detection delay. These results give both theoretical guarantees and practical feasibility for score-based detection in high-dimensional Markov models.
Related papers
- Process-Tensor Tomography of SGD: Measuring Non-Markovian Memory via Back-Flow of Distinguishability [1.078600700827543]
We build a simple model-agnostic witness of training memory based on emphback-flow of distinguishability.<n>We observe consistent positive back-flow with tight bootstrap confidence intervals, amplification under higher momentum, and more micro-steps.<n>We position this as a principled diagnostic and empirical evidence that practical SGD deviates from the Markov idealization.
arXiv Detail & Related papers (2026-01-23T09:03:25Z) - IDK-S: Incremental Distributional Kernel for Streaming Anomaly Detection [10.568922713534214]
Anomaly detection on data streams presents significant challenges, requiring methods to maintain high detection accuracy.<n>We introduce $mathcalIDK$-$mathcalS$, a novel $mathbfI$ncremental $mathbfD$istributional $mathbfK$ernel for $mathbfS$treaming anomaly detection.<n>Our experiments on thirteen benchmarks demonstrate that $mathcalIDK$-$mathcalS$ achieves superior detection accuracy while operating substantially
arXiv Detail & Related papers (2025-12-05T08:43:03Z) - Fast Convergence for High-Order ODE Solvers in Diffusion Probabilistic Models [5.939858158928473]
Diffusion probabilistic models generate samples by learning to reverse a noise-injection process that transforms data into noise.<n>A key development is the reformulation of the reverse sampling process as a deterministic probability flow ordinary differential equation (ODE)<n>We present a rigorous convergence analysis of deterministic samplers derived from ODEs for general forward processes with arbitrary variance schedules.
arXiv Detail & Related papers (2025-06-16T03:09:25Z) - Efficient and Provable Algorithms for Covariate Shift [2.0257616108612373]
We focus on estimating the average $mathbbE_tildemathbfxsim p_mathrmtestmathbff(tildemathbfx)$ of any unknown and bounded function.<n>We give several efficient algorithms, with provable sample complexity and computational guarantees.
arXiv Detail & Related papers (2025-02-21T10:47:46Z) - Score Change of Variables [0.0]
We show that for a smooth, invertible transformation $mathbfy = phi(mathbfx)$, the transformed score function $nabla_mathbfy log q(mathbfy)$ can be expressed directly in terms of $nabla_mathbfx log p(mathbfx)$.<n>We also introduce generalized sliced score matching, extending traditional sliced score matching from linear projections to arbitrary smooth transformations.
arXiv Detail & Related papers (2024-12-10T20:27:15Z) - Beyond likelihood ratio bias: Nested multi-time-scale stochastic approximation for likelihood-free parameter estimation [49.78792404811239]
We study inference in simulation-based models where the analytical form of the likelihood is unknown.<n>We use a ratio-free nested multi-time-scale approximation (SA) method that simultaneously tracks the score and drives the parameter update.<n>We show that our algorithm can eliminate the original bias $Obig(sqrtfrac1Nbig)$ and accelerate the convergence rate from $Obig(beta_k+sqrtfracalpha_kNbig)$.
arXiv Detail & Related papers (2024-11-20T02:46:15Z) - Semi-Supervised Laplace Learning on Stiefel Manifolds [48.3427853588646]
We develop the framework Sequential Subspace for graph-based, supervised samples at low-label rates.
We achieves that our methods at extremely low rates, and high label rates.
arXiv Detail & Related papers (2023-07-31T20:19:36Z) - Statistical Inference of Constrained Stochastic Optimization via Sketched Sequential Quadratic Programming [53.63469275932989]
We consider online statistical inference of constrained nonlinear optimization problems.<n>We apply the Sequential Quadratic Programming (StoSQP) method to solve these problems.
arXiv Detail & Related papers (2022-05-27T00:34:03Z) - High Probability Bounds for a Class of Nonconvex Algorithms with AdaGrad
Stepsize [55.0090961425708]
We propose a new, simplified high probability analysis of AdaGrad for smooth, non- probability problems.
We present our analysis in a modular way and obtain a complementary $mathcal O (1 / TT)$ convergence rate in the deterministic setting.
To the best of our knowledge, this is the first high probability for AdaGrad with a truly adaptive scheme, i.e., completely oblivious to the knowledge of smoothness.
arXiv Detail & Related papers (2022-04-06T13:50:33Z) - Learning Sparse Graph with Minimax Concave Penalty under Gaussian Markov
Random Fields [51.07460861448716]
This paper presents a convex-analytic framework to learn from data.
We show that a triangular convexity decomposition is guaranteed by a transform of the corresponding to its upper part.
arXiv Detail & Related papers (2021-09-17T17:46:12Z) - Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes.
Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z) - Least Squares Regression with Markovian Data: Fundamental Limits and
Algorithms [69.45237691598774]
We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain.
We establish sharp information theoretic minimax lower bounds for this problem in terms of $tau_mathsfmix$.
We propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate.
arXiv Detail & Related papers (2020-06-16T04:26:50Z) - Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and
Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP)
We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z) - Estimate exponential memory decay in Hidden Markov Model and its applications [9.94194748987129]
In this paper, we utilize the inherent memory decay in hidden Markov models, such that the forward and backward probabilities can be carried out with subsequences.
An efficient and accurate algorithm is proposed to numerically estimate the gap after the soft-max parametrization.
The continuity of Lyapunov spectrum ensures the estimated $B$ could be reused for the nearby parameter during the inference.
arXiv Detail & Related papers (2017-10-17T03:54:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.