Related papers: The Stationarity Bias: Stratified Stress-Testing for Time-Series Imputation in Regulated Dynamical Systems

The Stationarity Bias: Stratified Stress-Testing for Time-Series Imputation in Regulated Dynamical Systems

URL: http://arxiv.org/abs/2602.15637v1
Date: Tue, 17 Feb 2026 15:05:56 GMT
Title: The Stationarity Bias: Stratified Stress-Testing for Time-Series Imputation in Regulated Dynamical Systems
Authors: Amirreza Dolatpour Fathkouhi, Alireza Namazi, Heman Shakeri,
Abstract summary: Time-series imputation benchmarks use random masking and shape-agnostic metrics.<n>We formalize this bias and propose a emphStratified Stress-Test that partitions evaluation into Stationary and Transient regimes.
Score: 0.098314893665023
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Time-series imputation benchmarks employ uniform random masking and shape-agnostic metrics (MSE, RMSE), implicitly weighting evaluation by regime prevalence. In systems with a dominant attractor -- homeostatic physiology, nominal industrial operation, stable network traffic -- this creates a systematic \emph{Stationarity Bias}: simple methods appear superior because the benchmark predominantly samples the easy, low-entropy regime where they trivially succeed. We formalize this bias and propose a \emph{Stratified Stress-Test} that partitions evaluation into Stationary and Transient regimes. Using Continuous Glucose Monitoring (CGM) as a testbed -- chosen for its rigorous ground-truth forcing functions (meals, insulin) that enable precise regime identification -- we establish three findings with broad implications:(i)~Stationary Efficiency: Linear interpolation achieves state-of-the-art reconstruction during stable intervals, confirming that complex architectures are computationally wasteful in low-entropy regimes.(ii)~Transient Fidelity: During critical transients (post-prandial peaks, hypoglycemic events), linear methods exhibit drastically degraded morphological fidelity (DTW), disproportionate to their RMSE -- a phenomenon we term the \emph{RMSE Mirage}, where low pointwise error masks the destruction of signal shape.(iii)~Regime-Conditional Model Selection: Deep learning models preserve both pointwise accuracy and morphological integrity during transients, making them essential for safety-critical downstream tasks. We further derive empirical missingness distributions from clinical trials and impose them on complete training data, preventing models from exploiting unrealistically clean observations and encouraging robustness under real-world missingness. This framework generalizes to any regulated system where routine stationarity dominates critical transients.

Related papers

Learning Complex Physical Regimes via Coverage-oriented Uncertainty Quantification: An application to the Critical Heat Flux [0.0]
Uncertainty quantification (UQ) should not be viewed as a safety assessment, but as a support to the learning task itself.<n>We focus on the Critical Heat Flux benchmark and dataset presented by the OECD/NEA Expert Group on Reactor Systems Multi-Physics.<n>We show that while post-hoc methods ensure statistical calibration, coverage-oriented learning effectively reshapes the model's representation to match the complex physical regimes.
arXiv Detail & Related papers (2026-02-25T09:04:15Z)
Smooth embeddings in contracting recurrent networks driven by regular dynamics: A synthesis for neural representation [45.88028371034407]
Recent empirical work has documented topology-preserving latent organization in trained recurrent models.<n>Recent theoretical results in reservoir computing establish conditions under which the synchronization map is an embedding.<n>Our contribution is an integrated framework that assembles generalized synchronization and embedding guarantees for contracting reservoirs.
arXiv Detail & Related papers (2026-01-26T23:10:39Z)
Robust Machine Learning for Regulatory Sequence Modeling under Biological and Technical Distribution Shifts [0.3948325938742681]
We introduce a robustness framework to quantify performance degradation, calibration failures, and uncertainty based reliability.<n>In simulation, motif driven regulatory outputs are generated with cell type specific programs, perturbations, GC bias, depth variation, batch effects, and heteroscedastic noise.<n>Models remain accurate but show higher error, severe variance miscalibration, and coverage collapse under motif effect rewiring and noise dominated regimes.
arXiv Detail & Related papers (2026-01-21T13:15:27Z)
SIGMA: Scalable Spectral Insights for LLM Collapse [51.863164847253366]
We introduce SIGMA (Spectral Inequalities for Gram Matrix Analysis), a unified framework for model collapse.<n>By utilizing benchmarks that deriving and deterministic bounds on the matrix's spectrum, SIGMA provides a mathematically grounded metric to track the contraction of the representation space.<n>We demonstrate that SIGMA effectively captures the transition towards states, offering both theoretical insights into the mechanics of collapse.
arXiv Detail & Related papers (2026-01-06T19:47:11Z)
Disordered Dynamics in High Dimensions: Connections to Random Matrices and Machine Learning [52.26396748560348]
We provide an overview of high dimensional dynamical systems driven by random matrices.<n>We focus on applications to simple models of learning and generalization in machine learning theory.
arXiv Detail & Related papers (2026-01-03T00:12:32Z)
Correcting False Alarms from Unseen: Adapting Graph Anomaly Detectors at Test Time [60.341117019125214]
We propose a lightweight and plug-and-play Test-time adaptation framework for correcting Unseen Normal pattErns in graph anomaly detection (GAD)<n>To address semantic confusion, a graph aligner is employed to align the shifted data to the original one at the graph attribute level.<n>Extensive experiments on 10 real-world datasets demonstrate that TUNE significantly enhances the generalizability of pre-trained GAD models to both synthetic and real unseen normal patterns.
arXiv Detail & Related papers (2025-11-10T12:10:05Z)
T3: Test-Time Model Merging in VLMs for Zero-Shot Medical Imaging Analysis [15.624549727053475]
Existing model-merging techniques fail to deliver consistent gains across diverse medical modalities.<n>We introduce Test-Time Task adaptive merging (T3), a backpropagation-free framework that computes per-sample coefficients.<n>We present a rigorous cross-evaluation protocol spanning in-domain, base-to-novel, and corruptions across four modalities.
arXiv Detail & Related papers (2025-10-31T08:05:40Z)
Limits of Generative Pre-Training in Structured EMR Trajectories with Irregular Sampling [0.7537475180985093]
Foundation models refer to architectures trained on vast datasets using autoregressive pre-training to capture intricate patterns and motifs.<n>We trained two autoregressive models -- a sequence-to-sequence LSTM and a reduced Transformer -- on longitudinal ART for HIV and Acute Hypotension datasets.<n> Controlled irregularity was added during training via random inter-visit gaps, while test sequences stayed complete.<n>Both reproduced feature distributions but failed to preserve cross-feature structure.
arXiv Detail & Related papers (2025-10-27T00:04:17Z)
Revisiting Multivariate Time Series Forecasting with Missing Values [65.30332997607141]
Missing values are common in real-world time series.<n>Current approaches have developed an imputation-then-prediction framework that uses imputation modules to fill in missing values, followed by forecasting on the imputed data.<n>This framework overlooks a critical issue: there is no ground truth for the missing values, making the imputation process susceptible to errors that can degrade prediction accuracy.<n>We introduce Consistency-Regularized Information Bottleneck (CRIB), a novel framework built on the Information Bottleneck principle.
arXiv Detail & Related papers (2025-09-27T20:57:48Z)
Memorization and Regularization in Generative Diffusion Models [5.128303432235475]
Diffusion models have emerged as a powerful framework for generative modeling.<n>The analysis highlights the need for regularization to avoid reproducing the analytically tractable minimizer.<n>Experiments are evaluated in the context of memorization, and directions for future development of regularization are highlighted.
arXiv Detail & Related papers (2025-01-27T05:17:06Z)
REST: Efficient and Accelerated EEG Seizure Analysis through Residual State Updates [54.96885726053036]
This paper introduces a novel graph-based residual state update mechanism (REST) for real-time EEG signal analysis. By leveraging a combination of graph neural networks and recurrent structures, REST efficiently captures both non-Euclidean geometry and temporal dependencies within EEG data. Our model demonstrates high accuracy in both seizure detection and classification tasks.
arXiv Detail & Related papers (2024-06-03T16:30:19Z)
Stochastically forced ensemble dynamic mode decomposition for forecasting and analysis of near-periodic systems [65.44033635330604]
We introduce a novel load forecasting method in which observed dynamics are modeled as a forced linear system. We show that its use of intrinsic linear dynamics offers a number of desirable properties in terms of interpretability and parsimony. Results are presented for a test case using load data from an electrical grid.
arXiv Detail & Related papers (2020-10-08T20:25:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.