RI-Loss: A Learnable Residual-Informed Loss for Time Series Forecasting
- URL: http://arxiv.org/abs/2511.10130v1
- Date: Fri, 14 Nov 2025 01:34:00 GMT
- Title: RI-Loss: A Learnable Residual-Informed Loss for Time Series Forecasting
- Authors: Jieting Wang, Xiaolei Shang, Feijiang Li, Furong Peng,
- Abstract summary: Time series forecasting relies on predicting future values from historical data.<n>MSE has two fundamental weaknesses: its point-wise error fails to capture temporal relationships, and it does not account for inherent noise in the data.<n>We introduce the Residual-Informed Loss (RI-Loss), a novel objective function based on the Hilbert-Schmidt Independence Criterion (HSIC)
- Score: 13.117430904377905
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Time series forecasting relies on predicting future values from historical data, yet most state-of-the-art approaches-including transformer and multilayer perceptron-based models-optimize using Mean Squared Error (MSE), which has two fundamental weaknesses: its point-wise error computation fails to capture temporal relationships, and it does not account for inherent noise in the data. To overcome these limitations, we introduce the Residual-Informed Loss (RI-Loss), a novel objective function based on the Hilbert-Schmidt Independence Criterion (HSIC). RI-Loss explicitly models noise structure by enforcing dependence between the residual sequence and a random time series, enabling more robust, noise-aware representations. Theoretically, we derive the first non-asymptotic HSIC bound with explicit double-sample complexity terms, achieving optimal convergence rates through Bernstein-type concentration inequalities and Rademacher complexity analysis. This provides rigorous guarantees for RI-Loss optimization while precisely quantifying kernel space interactions. Empirically, experiments across eight real-world benchmarks and five leading forecasting models demonstrate improvements in predictive performance, validating the effectiveness of our approach. Code will be made publicly available to ensure reproducibility.
Related papers
- SynCast: Synergizing Contradictions in Precipitation Nowcasting via Diffusion Sequential Preference Optimization [62.958457694151384]
We introduce preference optimization into precipitation nowcasting for the first time, motivated by the success of reinforcement learning from human feedback in large language models.<n>In the first stage, the framework focuses on reducing FAR, training the model to effectively suppress false alarms.
arXiv Detail & Related papers (2025-10-22T16:11:22Z) - A Unified Frequency Domain Decomposition Framework for Interpretable and Robust Time Series Forecasting [81.73338008264115]
Current approaches for time series forecasting, whether in the time or frequency domain, predominantly use deep learning models based on linear layers or transformers.<n>We propose FIRE, a unified frequency domain decomposition framework that provides a mathematical abstraction for diverse types of time series.<n>Fire consistently outperforms state-of-the-art models on long-term forecasting benchmarks.
arXiv Detail & Related papers (2025-10-11T09:59:25Z) - Noise or Signal? Deconstructing Contradictions and An Adaptive Remedy for Reversible Normalization in Time Series Forecasting [4.212879006865343]
RevIN is a key technique enabling simple linear models to achieve state-of-the-art performance in time series forecasting.<n>This paper deconstructs the perplexing performance of various normalization strategies by identifying four underlying theoretical contradictions.
arXiv Detail & Related papers (2025-10-06T10:22:20Z) - Revisiting Multivariate Time Series Forecasting with Missing Values [65.30332997607141]
Missing values are common in real-world time series.<n>Current approaches have developed an imputation-then-prediction framework that uses imputation modules to fill in missing values, followed by forecasting on the imputed data.<n>This framework overlooks a critical issue: there is no ground truth for the missing values, making the imputation process susceptible to errors that can degrade prediction accuracy.<n>We introduce Consistency-Regularized Information Bottleneck (CRIB), a novel framework built on the Information Bottleneck principle.
arXiv Detail & Related papers (2025-09-27T20:57:48Z) - TIMED: Adversarial and Autoregressive Refinement of Diffusion-Based Time Series Generation [0.31498833540989407]
TIMED is a unified generative framework that captures global structure via a forward-reverse diffusion process.<n>To further align the real and synthetic distributions in feature space, TIMED incorporates a Maximum Mean Discrepancy (MMD) loss.<n>We show that TIMED generates more realistic and temporally coherent sequences than state-of-the-art generative models.
arXiv Detail & Related papers (2025-09-23T23:05:40Z) - ReTimeCausal: EM-Augmented Additive Noise Models for Interpretable Causal Discovery in Irregular Time Series [32.21736212737614]
This paper studies causal discovery in irregularly sampled time series in high-stakes domains like finance, healthcare, and climate science.<n>We propose ReTimeCausal, a novel integration of Additive Noise Models (ANM) and Expectation-Maximization (EM) that unifies physics-guided data imputation with sparse causal inference.
arXiv Detail & Related papers (2025-07-04T05:39:50Z) - Effective Probabilistic Time Series Forecasting with Fourier Adaptive Noise-Separated Diffusion [13.640501360968782]
FALDA is a novel probabilistic framework for time series forecasting.<n>It incorporates a component-specific architecture, enabling tailored modeling of individual temporal components.<n>Experiments on six real-world benchmarks demonstrate that FALDA consistently outperforms existing probabilistic forecasting approaches.<n>FALDA also achieves superior overall performance compared to state-of-the-art (SOTA) point forecasting approaches, with improvements of up to 9%.
arXiv Detail & Related papers (2025-05-16T14:32:34Z) - Lightweight Channel-wise Dynamic Fusion Model: Non-stationary Time Series Forecasting via Entropy Analysis [25.291749176117662]
We show that variance can be a valid and interpretable proxy for non-stationarity of time series.<n>We propose a novel lightweight textitChannel-wise textitDynamic textitFusion textitModel (textitCDFM)<n> Comprehensive experiments on seven time series datasets demonstrate the superiority and generalization capabilities of CDFM.
arXiv Detail & Related papers (2025-03-04T13:29:42Z) - Uncertainty-Aware Deep Attention Recurrent Neural Network for
Heterogeneous Time Series Imputation [0.25112747242081457]
Missingness is ubiquitous in multivariate time series and poses an obstacle to reliable downstream analysis.
We propose DEep Attention Recurrent Imputation (Imputation), which jointly estimates missing values and their associated uncertainty.
Experiments show that I surpasses the SOTA in diverse imputation tasks using real-world datasets.
arXiv Detail & Related papers (2024-01-04T13:21:11Z) - Deep Ensembles Meets Quantile Regression: Uncertainty-aware Imputation for Time Series [45.76310830281876]
We propose Quantile Sub-Ensembles, a novel method to estimate uncertainty with ensemble of quantile-regression-based task networks.
Our method not only produces accurate imputations that is robust to high missing rates, but also is computationally efficient due to the fast training of its non-generative model.
arXiv Detail & Related papers (2023-12-03T05:52:30Z) - Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs)
GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations.
We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality [131.45028999325797]
We develop a doubly robust off-policy AC (DR-Off-PAC) for discounted MDP.
DR-Off-PAC adopts a single timescale structure, in which both actor and critics are updated simultaneously with constant stepsize.
We study the finite-time convergence rate and characterize the sample complexity for DR-Off-PAC to attain an $epsilon$-accurate optimal policy.
arXiv Detail & Related papers (2021-02-23T18:56:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.