Related papers: Strong Linear Baselines Strike Back: Closed-Form Linear Models as Gaussian Process Conditional Density Estimators for TSAD

Strong Linear Baselines Strike Back: Closed-Form Linear Models as Gaussian Process Conditional Density Estimators for TSAD

URL: http://arxiv.org/abs/2602.00672v1
Date: Sat, 31 Jan 2026 11:35:51 GMT
Title: Strong Linear Baselines Strike Back: Closed-Form Linear Models as Gaussian Process Conditional Density Estimators for TSAD
Authors: Aleksandr Yugay, Hang Cui, Changhua Pei, Alexey Zaytsev,
Abstract summary: We show that a simple linear autoregressive anomaly score with the closed-form solution provided byOLS regression consistently matches or outperforms state-of-the-art deep detectors.<n>From a theoretical perspective, we show that linear models capture a broad class of anomaly types, estimating a finite-history Gaussian process conditional density.
Score: 41.074068820031655
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Research in time series anomaly detection (TSAD) has largely focused on developing increasingly sophisticated, hard-to-train, and expensive-to-infer neural architectures. We revisit this paradigm and show that a simple linear autoregressive anomaly score with the closed-form solution provided by ordinary least squares (OLS) regression consistently matches or outperforms state-of-the-art deep detectors. From a theoretical perspective, we show that linear models capture a broad class of anomaly types, estimating a finite-history Gaussian process conditional density. From a practical side, across extensive univariate and multivariate benchmarks, the proposed approach achieves superior accuracy while requiring orders of magnitude fewer computational resources. Thus, future research should consistently include strong linear baselines and, more importantly, develop new benchmarks with richer temporal structures pinpointing the advantages of deep learning models.

Related papers

Towards Foundation Models for Zero-Shot Time Series Anomaly Detection: Leveraging Synthetic Data and Relative Context Discrepancy [33.68487894996624]
Time series anomaly detection (TSAD) is a critical task, but developing models that generalize to unseen data remains a major challenge.<n>We introduce textttTimeRCD, a novel foundation model for TSAD built upon a new pre-training paradigm: Relative Context Discrepancy (RCD)<n>We show that textttTimeRCD significantly outperforms existing general-purpose and anomaly-specific foundation models in zero-shot TSAD.
arXiv Detail & Related papers (2025-09-25T14:05:15Z)
Interpretable Deep Regression Models with Interval-Censored Failure Time Data [1.2993568435938014]
Deep learning methods for interval-censored data remain underexplored and limited to specific data type or model.<n>This work proposes a general regression framework for interval-censored data with a broad class of partially linear transformation models.<n>Applying our method to the Alzheimer's Disease Neuroimaging Initiative dataset yields novel insights and improved predictive performance compared to traditional approaches.
arXiv Detail & Related papers (2025-03-25T15:27:32Z)
In-Context Linear Regression Demystified: Training Dynamics and Mechanistic Interpretability of Multi-Head Softmax Attention [52.159541540613915]
We study how multi-head softmax attention models are trained to perform in-context learning on linear data.<n>Our results reveal that in-context learning ability emerges from the trained transformer as an aggregated effect of its architecture and the underlying data distribution.
arXiv Detail & Related papers (2025-03-17T02:00:49Z)
Theory-guided Pseudo-spectral Full Waveform Inversion via Deep Neural Networks [0.0]
Full-Waveform Inversion seeks to achieve a high-resolution model of the subsurface.<n>Deep Learning techniques have emerged as excellent optimization frameworks.<n>This work addresses the lacuna that exists in incorporating the pseudo-spectral approach within Deep Learning.
arXiv Detail & Related papers (2025-02-24T20:18:55Z)
Data-Driven Pseudo-spectral Full Waveform Inversion via Deep Neural Networks [0.0]
We re-formulate the pseudo-spectral FWI problem as a Deep Learning algorithm for a data-driven pseudo-spectral approach.<n>Inversion of data-driven pseudo-spectralimat was found to outperform classical FWI for deeper and over-thrust areas.
arXiv Detail & Related papers (2025-02-24T19:50:36Z)
Preconditioned Inexact Stochastic ADMM for Deep Model [35.37705488695026]
This paper develops an algorithm, PISA, which enables scalable parallel computing and supports various preconditions.<n>It converges under the sole assumption of Lipschitz continuity of the gradient on a bounded region, removing the need for other conditions commonly imposed by methods.<n>It demonstrates its superior numerical performance compared to various state-of-the-art iterations.
arXiv Detail & Related papers (2025-02-15T12:28:51Z)
Closed-form Continuous-Depth Models [99.40335716948101]
Continuous-depth neural models rely on advanced numerical differential equation solvers. We present a new family of models, termed Closed-form Continuous-depth (CfC) networks, that are simple to describe and at least one order of magnitude faster.
arXiv Detail & Related papers (2021-06-25T22:08:51Z)
Goal-directed Generation of Discrete Structures with Conditional Generative Models [85.51463588099556]
We introduce a novel approach to directly optimize a reinforcement learning objective, maximizing an expected reward. We test our methodology on two tasks: generating molecules with user-defined properties and identifying short python expressions which evaluate to a given target value.
arXiv Detail & Related papers (2020-10-05T20:03:13Z)
Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear. We show that it commonly arises in parameters of discrete multiplicative noise due to variance. A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z)
Kernel and Rich Regimes in Overparametrized Models [69.40899443842443]
We show that gradient descent on overparametrized multilayer networks can induce rich implicit biases that are not RKHS norms. We also demonstrate this transition empirically for more complex matrix factorization models and multilayer non-linear networks.
arXiv Detail & Related papers (2020-02-20T15:43:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.