Technical note on Sequential Test-Time Adaptation via Martingale-Driven Fisher Prompting
- URL: http://arxiv.org/abs/2510.03839v1
- Date: Sat, 04 Oct 2025 15:31:26 GMT
- Title: Technical note on Sequential Test-Time Adaptation via Martingale-Driven Fisher Prompting
- Authors: Behraj Khan, Tahir Qasim Syed,
- Abstract summary: M-FISHER is a method for sequential distribution shift detection and stable adaptation in streaming data.<n>For detection, we construct an exponential martingale from non-conformity scores and apply Ville's inequality to obtain time-uniform guarantees on false alarm control.<n>For adaptation, we show that Fisher-preconditioned updates of prompt parameters implement natural gradient descent on the distributional manifold.
- Score: 3.5808917363708743
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We present a theoretical framework for M-FISHER, a method for sequential distribution shift detection and stable adaptation in streaming data. For detection, we construct an exponential martingale from non-conformity scores and apply Ville's inequality to obtain time-uniform guarantees on false alarm control, ensuring statistical validity at any stopping time. Under sustained shifts, we further bound the expected detection delay as $\mathcal{O}(\log(1/\delta)/\Gamma)$, where $\Gamma$ reflects the post-shift information gain, thereby linking detection efficiency to distributional divergence. For adaptation, we show that Fisher-preconditioned updates of prompt parameters implement natural gradient descent on the distributional manifold, yielding locally optimal updates that minimize KL divergence while preserving stability and parameterization invariance. Together, these results establish M-FISHER as a principled approach for robust, anytime-valid detection and geometrically stable adaptation in sequential decision-making under covariate shift.
Related papers
- Rethinking Test-Time Training: Tilting The Latent Distribution For Few-Shot Source-Free Adaptation [3.5808917363708743]
We study test-time adaptation of foundation models for few-shot classification under a completely frozen-model regime.<n>We propose arguably the first training-free inference method that adapts predictions to the new task by performing a change of measure over the latent embedding distribution induced by the encoder.
arXiv Detail & Related papers (2026-02-02T18:17:29Z) - Quick Change Detection in Discrete-Time in Presence of a Covert Adversary [7.58317340007754]
We study the problem of covert quickest change detection in a discrete-time setting, where a sequence of observations undergoes a distributional change at an unknown time.<n>We characterize the behavior of the average detection delay (ADD) and the average time to false alarm (AT2FA) when the post-change distribution converges to the pre-change distribution as $to infty$.<n>We identify the critical scaling laws governing covert behavior and derive explicit conditions under which an adversary can maintain covertness, defined by ADD = $()$, whereas in the classical setting, ADD grows only as $
arXiv Detail & Related papers (2026-01-27T19:56:31Z) - Stopping Rules for Stochastic Gradient Descent via Anytime-Valid Confidence Sequences [51.56484100374058]
We study stopping rules for gradient descent (SGD) for convex optimization.<n>We develop an anytime-valid, data-dependent upper confidence sequence for the weighted average suboptimality of projected SGD.<n>These are the first rigorous, time-uniform performance guarantees and finitetime $varepsilon$-optimality certificates.
arXiv Detail & Related papers (2025-12-15T09:26:45Z) - Adaptive Regime-Switching Forecasts with Distribution-Free Uncertainty: Deep Switching State-Space Models Meet Conformal Prediction [38.37518767859008]
We study distribution-free uncertainty for regime-switching forecasting by coupling Deep Switching State Space Models with Adaptive Conformal Inference (ACI) and its aggregated variant (AgACI)<n>We also introduce a unified conformal wrapper that sits atop strong sequence baselines including S4, MC-Dropout GRU, sparse Gaussian processes, and a change-point local model to produce online predictive bands with finite-sample marginal guarantees under nonstationarity and model misspecification.
arXiv Detail & Related papers (2025-12-02T23:21:01Z) - BayesTTA: Continual-Temporal Test-Time Adaptation for Vision-Language Models via Gaussian Discriminant Analysis [41.09181390655176]
Vision-language models (VLMs) such as CLIP achieve strong zero-shot recognition but degrade significantly under textittemporally evolving distribution shifts common in real-world scenarios.<n>We formalize this practical problem as textitContinual-Temporal Test-Time Adaptation (CT-TTA), where test distributions evolve gradually over time.<n>We propose textitBayesTTA, a Bayesian adaptation framework that enforces temporally consistent predictions and dynamically aligns visual representations.
arXiv Detail & Related papers (2025-07-11T14:02:54Z) - WATCH: Adaptive Monitoring for AI Deployments via Weighted-Conformal Martingales [22.789611187514975]
Methods for nonparametric sequential testing -- especially conformal test martingales (CTMs) and anytime-valid inference -- offer promising tools for this monitoring task.<n>Existing approaches are restricted to monitoring limited hypothesis classes or alarm criteria''
arXiv Detail & Related papers (2025-05-07T17:53:47Z) - Error-quantified Conformal Inference for Time Series [55.11926160774831]
Uncertainty quantification in time series prediction is challenging due to the temporal dependence and distribution shift on sequential data.<n>We propose itError-quantified Conformal Inference (ECI) by smoothing the quantile loss function.<n>ECI can achieve valid miscoverage control and output tighter prediction sets than other baselines.
arXiv Detail & Related papers (2025-02-02T15:02:36Z) - Effect of Random Learning Rate: Theoretical Analysis of SGD Dynamics in Non-Convex Optimization via Stationary Distribution [5.5165579223151795]
We consider a variant of the gradient descent (SGD) with a random learning rate.<n>We show that a distribution of a parameter updated by Poisson SGD converges to a stationary distribution under weak assumptions.
arXiv Detail & Related papers (2024-06-23T06:52:33Z) - Nonstationary Time Series Forecasting via Unknown Distribution Adaptation [22.915008205203886]
As environments evolve, temporal distribution shifts can degrade time series forecasting performance.<n>Some methods disentangle stationary and nonstationary components by assuming uniform distribution shifts, but it is impractical since when the distribution changes is unknown.<n>We propose the textbfUnknown textbfDistribution textbfAdaptation (textbfUDA) model for nonstationary time series forecasting.
arXiv Detail & Related papers (2024-02-20T07:16:12Z) - Stochastic Approximation with Delayed Updates: Finite-Time Rates under Markovian Sampling [73.5602474095954]
We study the non-asymptotic performance of approximation schemes with delayed updates under Markovian sampling.
Our theoretical findings shed light on the finite-time effects of delays for a broad class of algorithms.
arXiv Detail & Related papers (2024-02-19T03:08:02Z) - Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution.
We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z) - Adapting to Continuous Covariate Shift via Online Density Ratio Estimation [64.8027122329609]
Dealing with distribution shifts is one of the central challenges for modern machine learning.
We propose an online method that can appropriately reuse historical information.
Our density ratio estimation method is proven to perform well by enjoying a dynamic regret bound.
arXiv Detail & Related papers (2023-02-06T04:03:33Z) - Certifying Model Accuracy under Distribution Shifts [151.67113334248464]
We present provable robustness guarantees on the accuracy of a model under bounded Wasserstein shifts of the data distribution.
We show that a simple procedure that randomizes the input of the model within a transformation space is provably robust to distributional shifts under the transformation.
arXiv Detail & Related papers (2022-01-28T22:03:50Z) - Training on Test Data with Bayesian Adaptation for Covariate Shift [96.3250517412545]
Deep neural networks often make inaccurate predictions with unreliable uncertainty estimates.
We derive a Bayesian model that provides for a well-defined relationship between unlabeled inputs under distributional shift and model parameters.
We show that our method improves both accuracy and uncertainty estimation.
arXiv Detail & Related papers (2021-09-27T01:09:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.