Related papers: Learning under Distributional Drift: Reproducibility as an Intrinsic Statistical Resource

Learning under Distributional Drift: Reproducibility as an Intrinsic Statistical Resource

URL: http://arxiv.org/abs/2512.13506v1
Date: Mon, 15 Dec 2025 16:34:47 GMT
Title: Learning under Distributional Drift: Reproducibility as an Intrinsic Statistical Resource
Authors: Sofiya Zaichyk,
Abstract summary: We introduce a new statistical primitive, the budget $C_T$, which quantifies a system's finite capacity for statistical analysis.<n>No algorithm can achieve smaller worst-case generalization error than that imposed by the average Fisher-Rao drift rate.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Statistical learning under distributional drift remains insufficiently characterized: when each observation alters the data-generating law, classical generalization bounds can collapse. We introduce a new statistical primitive, the reproducibility budget $C_T$, which quantifies a system's finite capacity for statistical reproducibility - the extent to which its sampling process can remain governed by a consistent underlying distribution in the presence of both exogenous change and endogenous feedback. Formally, $C_T$ is defined as the cumulative Fisher-Rao path length of the coupled learner-environment evolution, measuring the total distributional motion accumulated during learning. From this construct we derive a drift-feedback generalization bound of order $O(T^{-1/2} + C_T/T)$, and we prove a matching minimax lower bound showing that this rate is minimax-optimal. Consequently, the results establish a reproducibility speed limit: no algorithm can achieve smaller worst-case generalization error than that imposed by the average Fisher-Rao drift rate $C_T/T$ of the data-generating process. The framework situates exogenous drift, adaptive data analysis, and performative prediction within a common geometric structure, with $C_T$ emerging as the intrinsic quantity measuring distributional motion across these settings.

Related papers

From Shortcut to Induction Head: How Data Diversity Shapes Algorithm Selection in Transformers [67.02076505996284]
We study how the choice of pretraining data distribution steers a shallow transformer toward one behavior or the other.<n>Our results shed light on the algorithmic biases of pretrained transformers and offer conceptual guidelines for data-driven control of their learned behaviors.
arXiv Detail & Related papers (2025-12-21T08:10:26Z)
Generative Modeling with Continuous Flows: Sample Complexity of Flow Matching [60.37045080890305]
We provide the first analysis of the sample complexity for flow-matching based generative models.<n>We decompose the velocity field estimation error into neural-network approximation error, statistical error due to the finite sample size, and optimization error due to the finite number of optimization steps for estimating the velocity field.
arXiv Detail & Related papers (2025-12-01T05:14:25Z)
Adaptivity and Convergence of Probability Flow ODEs in Diffusion Generative Models [5.064404027153094]
This paper contributes to establishing theoretical guarantees for the probability flow ODE, a diffusion-based sampler known for its practical efficiency.<n>We demonstrate that, with accurate score function estimation, the probability flow ODE sampler achieves a convergence rate of $O(k/T)$ in total variation distance.<n>This dimension-free convergence rate improves upon existing results that scale with the typically much larger ambient dimension.
arXiv Detail & Related papers (2025-01-31T03:10:10Z)
Benign Overfitting in Out-of-Distribution Generalization of Linear Models [19.203753135860016]
We take an initial step towards understanding benign overfitting in the Out-of-Distribution (OOD) regime.<n>We provide non-asymptotic guarantees proving that benign overfitting occurs in standard ridge regression.<n>We also present theoretical results for a more general family of target covariance matrix.
arXiv Detail & Related papers (2024-12-19T02:47:39Z)
O(d/T) Convergence Theory for Diffusion Probabilistic Models under Minimal Assumptions [6.76974373198208]
We establish a fast convergence theory for the denoising diffusion probabilistic model (DDPM) under minimal assumptions.<n>We show that the convergence rate improves to $O(k/T)$, where $k$ is the intrinsic dimension of the target data distribution.<n>This highlights the ability of DDPM to automatically adapt to unknown low-dimensional structures.
arXiv Detail & Related papers (2024-09-27T17:59:10Z)
Generalization error of min-norm interpolators in transfer learning [2.7309692684728617]
Min-norm interpolators emerge naturally as implicit regularized limits of modern machine learning algorithms. In many applications, a limited amount of test data may be available during training, yet properties of min-norm in this setting are not well-understood. We establish a novel anisotropic local law to achieve these characterizations.
arXiv Detail & Related papers (2024-06-20T02:23:28Z)
Rejection via Learning Density Ratios [50.91522897152437]
Classification with rejection emerges as a learning paradigm which allows models to abstain from making predictions.<n>We propose a different distributional perspective, where we seek to find an idealized data distribution which maximizes a pretrained model's performance.<n>Our framework is tested empirically over clean and noisy datasets.
arXiv Detail & Related papers (2024-05-29T01:32:17Z)
A Statistical Learning View of Simple Kriging [0.0]
We analyze the simple Kriging task from a statistical learning perspective. The goal is to predict the unknown values it takes at any other location with minimum quadratic risk. We prove non-asymptotic bounds of order $O_mathbbP (1/sqrtn)$ for the excess risk of a plug-in predictive rule mimicking the true minimizer.
arXiv Detail & Related papers (2022-02-15T12:46:43Z)
Heavy-tailed Streaming Statistical Estimation [58.70341336199497]
We consider the task of heavy-tailed statistical estimation given streaming $p$ samples. We design a clipped gradient descent and provide an improved analysis under a more nuanced condition on the noise of gradients.
arXiv Detail & Related papers (2021-08-25T21:30:27Z)
SLOE: A Faster Method for Statistical Inference in High-Dimensional Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets. Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z)
Log-Likelihood Ratio Minimizing Flows: Towards Robust and Quantifiable Neural Distribution Alignment [52.02794488304448]
We propose a new distribution alignment method based on a log-likelihood ratio statistic and normalizing flows. We experimentally verify that minimizing the resulting objective results in domain alignment that preserves the local structure of input domains.
arXiv Detail & Related papers (2020-03-26T22:10:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.