Related papers: The Procrustean Bed of Time Series: The Optimization Bias of Point-wise Loss

The Procrustean Bed of Time Series: The Optimization Bias of Point-wise Loss

URL: http://arxiv.org/abs/2512.18610v1
Date: Sun, 21 Dec 2025 06:08:22 GMT
Title: The Procrustean Bed of Time Series: The Optimization Bias of Point-wise Loss
Authors: Rongyao Cai, Yuxi Wan, Kexin Zhang, Ming Jin, Hao Wang, Zhiqiang Ge, Daoyi Dong, Yong Liu, Qingsong Wen,
Abstract summary: This paper aims to provide a first-principles analysis of the Expectation of Optimization Bias (EOB)<n>Our analysis reveals a fundamental paradigm paradox: the more deterministic and structured the time series, the more severe the bias by point-wise loss function.<n>We present a concrete solution that simultaneously achieves both principles via DFT or DWT.
Score: 53.542743390809356
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Optimizing time series models via point-wise loss functions (e.g., MSE) relying on a flawed point-wise independent and identically distributed (i.i.d.) assumption that disregards the causal temporal structure, an issue with growing awareness yet lacking formal theoretical grounding. Focusing on the core independence issue under covariance stationarity, this paper aims to provide a first-principles analysis of the Expectation of Optimization Bias (EOB), formalizing it information-theoretically as the discrepancy between the true joint distribution and its flawed i.i.d. counterpart. Our analysis reveals a fundamental paradigm paradox: the more deterministic and structured the time series, the more severe the bias by point-wise loss function. We derive the first closed-form quantification for the non-deterministic EOB across linear and non-linear systems, and prove EOB is an intrinsic data property, governed exclusively by sequence length and our proposed Structural Signal-to-Noise Ratio (SSNR). This theoretical diagnosis motivates our principled debiasing program that eliminates the bias through sequence length reduction and structural orthogonalization. We present a concrete solution that simultaneously achieves both principles via DFT or DWT. Furthermore, a novel harmonized $\ell_p$ norm framework is proposed to rectify gradient pathologies of high-variance series. Extensive experiments validate EOB Theory's generality and the superior performance of debiasing program.

Related papers

On Multi-Step Theorem Prediction via Non-Parametric Structural Priors [50.16583672681106]
In this work, we explore training-free theorem prediction through the lens of in-context learning (ICL)<n>We propose Theorem Precedence Graphs, which encode temporal dependencies from historical solution traces as directed graphs, and impose explicit topological constraints that effectively prune the search space during inference.<n>Experiments on the FormalGeo7k benchmark show that our method achieves 89.29% accuracy, substantially outperforming ICL baselines and matching state-of-the-art supervised models.
arXiv Detail & Related papers (2026-03-05T06:08:50Z)
Stability and Generalization of Push-Sum Based Decentralized Optimization over Directed Graphs [55.77845440440496]
Push-based decentralized communication enables optimization over communication networks, where information exchange may be asymmetric.<n>We develop a unified uniform-stability framework for the Gradient Push (SGP) algorithm.<n>A key technical ingredient is an imbalance-aware generalization bound through two quantities.
arXiv Detail & Related papers (2026-02-24T05:32:03Z)
A Generalized Adaptive Joint Learning Framework for High-Dimensional Time-Varying Models [0.8594140167290097]
This article introduces Adaptive Joint Learning (AJL), a regularization framework designed to simultaneously perform functional variable selection and structural changepoint detection.<n>The analysis uncovers synchronized phase transitions in disease progression and identifies a parsimonious set of time-varying prognostic markers.
arXiv Detail & Related papers (2026-01-08T02:07:49Z)
Interpretability-Guided Bi-objective Optimization: Aligning Accuracy and Explainability [0.0]
IGBO encodes feature importance hierarchies as a Directed Acyclic Graph (DAG)<n>Central Limit Theorem-based construction of DAG ensures statistical validity of edge orientation decisions.<n>IGBO's effectiveness in enforcing DAG constraints with minimal accuracy loss, outperforming standard regularization baselines.
arXiv Detail & Related papers (2026-01-02T11:32:00Z)
Causal Inference as Distribution Adaptation: Optimizing ATE Risk under Propensity Uncertainty [0.0]
We reframing ATE estimation as a textitdomain adaptation problem under distribution shift.<n>We propose the textbfJoint Robust Estimator (JRE) to train outcome models jointly.
arXiv Detail & Related papers (2025-12-19T21:40:46Z)
Sharp Structure-Agnostic Lower Bounds for General Functional Estimation [22.228743542695835]
This paper provides a systematic investigation of the optimal error rates achievable by structure-agnostic estimators.<n>We first show that, for estimating the average treatment effect (ATE), a central parameter in causal inference, doubly robust learning attains optimal structure-agnostic error rates.<n>We then extend our analysis to a general class of functionals that depend on unknown nuisance functions and establish the structure-agnostic optimality of debiased/double machine learning.
arXiv Detail & Related papers (2025-12-19T08:34:05Z)
Nonconvex Penalized LAD Estimation in Partial Linear Models with DNNs: Asymptotic Analysis and Proximal Algorithms [3.8450418136547224]
This paper investigates the linear partial model by Least Absolute Deviation (LAD)<n>We formulate the non subgradient parameter using Deep Neural Networks (NNs) and a penalized LAD model.
arXiv Detail & Related papers (2025-11-26T07:01:35Z)
ERIS: An Energy-Guided Feature Disentanglement Framework for Out-of-Distribution Time Series Classification [51.07970070817353]
An ideal time series classification (TSC) should be able to capture invariant representations.<n>Current methods are largely unguided, lacking the semantic direction required to isolate truly universal features.<n>We propose an end-to-end Energy-Regularized Information for Shift-Robustness framework to enable guided and reliable feature disentanglement.
arXiv Detail & Related papers (2025-08-19T12:13:41Z)
Optimal Condition for Initialization Variance in Deep Neural Networks: An SGD Dynamics Perspective [0.0]
gradient descent (SGD) is one of the most fundamental optimization algorithms in machine learning (ML)<n>We study the relationship between the quasi-stationary distribution derived from this equation and the initial distribution through the Kullback-Leibler (KL) divergence.<n>We experimentally confirm our theoretical results by using the classical SGD to train fully connected neural networks on the MNIST and Fashion-MNIST datasets.
arXiv Detail & Related papers (2025-08-18T11:18:12Z)
Neural variational Data Assimilation with Uncertainty Quantification using SPDE priors [28.804041716140194]
Recent advances in the deep learning community enables to address the problem through a neural architecture a variational data assimilation framework.<n>In this work we use the theory of Partial Differential Equations (SPDE) and Gaussian Processes (GP) to estimate both space-and time covariance of the state.
arXiv Detail & Related papers (2024-02-02T19:18:12Z)
A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime. We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z)
BCD Nets: Scalable Variational Approaches for Bayesian Causal Discovery [97.79015388276483]
A structural equation model (SEM) is an effective framework to reason over causal relationships represented via a directed acyclic graph (DAG) Recent advances enabled effective maximum-likelihood point estimation of DAGs from observational data. We propose BCD Nets, a variational framework for estimating a distribution over DAGs characterizing a linear-Gaussian SEM.
arXiv Detail & Related papers (2021-12-06T03:35:21Z)
Learning to Estimate Without Bias [57.82628598276623]
Gauss theorem states that the weighted least squares estimator is a linear minimum variance unbiased estimation (MVUE) in linear models. In this paper, we take a first step towards extending this result to non linear settings via deep learning with bias constraints. A second motivation to BCE is in applications where multiple estimates of the same unknown are averaged for improved performance.
arXiv Detail & Related papers (2021-10-24T10:23:51Z)
Benign Overfitting of Constant-Stepsize SGD for Linear Regression [122.70478935214128]
inductive biases are central in preventing overfitting empirically. This work considers this issue in arguably the most basic setting: constant-stepsize SGD for linear regression. We reflect on a number of notable differences between the algorithmic regularization afforded by (unregularized) SGD in comparison to ordinary least squares.
arXiv Detail & Related papers (2021-03-23T17:15:53Z)
To Regularize or Not To Regularize? The Bias Variance Trade-off in Regularized AEs [10.611727286504994]
We study the effect of the latent prior on the generation deterministic quality of AE models. We show that our model, called FlexAE, is the new state-of-the-art for the AE based generative models.
arXiv Detail & Related papers (2020-06-10T14:00:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.