EVEREST: An Evidential, Tail-Aware Transformer for Rare-Event Time-Series Forecasting
- URL: http://arxiv.org/abs/2601.19022v2
- Date: Wed, 28 Jan 2026 17:40:06 GMT
- Title: EVEREST: An Evidential, Tail-Aware Transformer for Rare-Event Time-Series Forecasting
- Authors: Antanas Zilinskas, Robert N. Shorten, Jakub Marecek,
- Abstract summary: EVEREST is a transformer-based architecture for probabilistic rare-event forecasting.<n>It delivers calibrated predictions and tail-aware risk estimation.<n>It is applicable to high-stakes domains such as industrial monitoring, weather, and satellite diagnostics.
- Score: 4.551615447454767
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Forecasting rare events in multivariate time-series data is challenging due to severe class imbalance, long-range dependencies, and distributional uncertainty. We introduce EVEREST, a transformer-based architecture for probabilistic rare-event forecasting that delivers calibrated predictions and tail-aware risk estimation, with auxiliary interpretability via attention-based signal attribution. EVEREST integrates four components: (i) a learnable attention bottleneck for soft aggregation of temporal dynamics; (ii) an evidential head for estimating aleatoric and epistemic uncertainty via a Normal--Inverse--Gamma distribution; (iii) an extreme-value head that models tail risk using a Generalized Pareto Distribution; and (iv) a lightweight precursor head for early-event detection. These modules are jointly optimized with a composite loss (focal loss, evidential NLL, and a tail-sensitive EVT penalty) and act only at training time; deployment uses a single classification head with no inference overhead (approximately 0.81M parameters). On a decade of space-weather data, EVEREST achieves state-of-the-art True Skill Statistic (TSS) of 0.973/0.970/0.966 at 24/48/72-hour horizons for C-class flares. The model is compact, efficient to train on commodity hardware, and applicable to high-stakes domains such as industrial monitoring, weather, and satellite diagnostics. Limitations include reliance on fixed-length inputs and exclusion of image-based modalities, motivating future extensions to streaming and multimodal forecasting.
Related papers
- Real-Time Proactive Anomaly Detection via Forward and Backward Forecast Modeling [0.0]
We introduce two proactive anomaly detection frameworks: the Forward Forecasting Model (FFM) and the Backward Reconstruction Model (BRM)<n>FFM forecasts future sequences to anticipate disruptions, while BRM reconstructs recent history from future context to uncover early precursors.<n>Our models support both continuous and discrete multivariate features, enabling robust performance in real-world settings.
arXiv Detail & Related papers (2026-02-12T03:57:41Z) - SimDiff: Simpler Yet Better Diffusion Model for Time Series Point Forecasting [8.141505251306622]
Diffusion models have recently shown promise in time series forecasting.<n>They often fail to achieve state-of-the-art point estimation performance.<n>We propose SimDiff, a single-stage, end-to-end framework for point estimation.
arXiv Detail & Related papers (2025-11-24T16:09:55Z) - Optimal Look-back Horizon for Time Series Forecasting in Federated Learning [26.070107882914844]
This paper presents a principled framework for adaptive horizon selection in federated time series forecasting.<n>We derive a decomposition of the forecasting loss into a Bayesian term, which reflects irreducible uncertainty.<n>We prove that the total forecasting loss is minimized at the smallest horizon where the irreducible loss starts to saturate, while the approximation loss continues to rise.
arXiv Detail & Related papers (2025-11-16T21:46:54Z) - Unsupervised Anomaly Prediction with N-BEATS and Graph Neural Network in Multi-variate Semiconductor Process Time Series [1.0874100424278175]
anomaly prediction in semiconductor fabrication presents several critical challenges.<n>The complex interdependencies between variables complicate both anomaly prediction and root-cause-analysis.<n>This paper proposes two novel approaches to advance the field from anomaly detection to anomaly prediction.
arXiv Detail & Related papers (2025-10-23T16:33:52Z) - A Unified Frequency Domain Decomposition Framework for Interpretable and Robust Time Series Forecasting [81.73338008264115]
Current approaches for time series forecasting, whether in the time or frequency domain, predominantly use deep learning models based on linear layers or transformers.<n>We propose FIRE, a unified frequency domain decomposition framework that provides a mathematical abstraction for diverse types of time series.<n>Fire consistently outperforms state-of-the-art models on long-term forecasting benchmarks.
arXiv Detail & Related papers (2025-10-11T09:59:25Z) - Revisiting Multivariate Time Series Forecasting with Missing Values [65.30332997607141]
Missing values are common in real-world time series.<n>Current approaches have developed an imputation-then-prediction framework that uses imputation modules to fill in missing values, followed by forecasting on the imputed data.<n>This framework overlooks a critical issue: there is no ground truth for the missing values, making the imputation process susceptible to errors that can degrade prediction accuracy.<n>We introduce Consistency-Regularized Information Bottleneck (CRIB), a novel framework built on the Information Bottleneck principle.
arXiv Detail & Related papers (2025-09-27T20:57:48Z) - BayesTTA: Continual-Temporal Test-Time Adaptation for Vision-Language Models via Gaussian Discriminant Analysis [41.09181390655176]
Vision-language models (VLMs) such as CLIP achieve strong zero-shot recognition but degrade significantly under textittemporally evolving distribution shifts common in real-world scenarios.<n>We formalize this practical problem as textitContinual-Temporal Test-Time Adaptation (CT-TTA), where test distributions evolve gradually over time.<n>We propose textitBayesTTA, a Bayesian adaptation framework that enforces temporally consistent predictions and dynamically aligns visual representations.
arXiv Detail & Related papers (2025-07-11T14:02:54Z) - Evidential time-to-event prediction with calibrated uncertainty quantification [12.446406577462069]
Time-to-event analysis provides insights into clinical prognosis and treatment recommendations.<n>We propose an evidential regression model specifically designed for time-to-event prediction.<n>We show that our model delivers both accurate and reliable performance, outperforming state-of-the-art methods.
arXiv Detail & Related papers (2024-11-12T15:06:04Z) - Channel-aware Contrastive Conditional Diffusion for Multivariate Probabilistic Time Series Forecasting [19.383395337330082]
We propose a generic channel-aware Contrastive Conditional Diffusion model entitled CCDM.
The proposed CCDM can exhibit superior forecasting capability compared to current state-of-the-art diffusion forecasters.
arXiv Detail & Related papers (2024-10-03T03:13:15Z) - Score Matching-based Pseudolikelihood Estimation of Neural Marked
Spatio-Temporal Point Process with Uncertainty Quantification [59.81904428056924]
We introduce SMASH: a Score MAtching estimator for learning markedPs with uncertainty quantification.
Specifically, our framework adopts a normalization-free objective by estimating the pseudolikelihood of markedPs through score-matching.
The superior performance of our proposed framework is demonstrated through extensive experiments in both event prediction and uncertainty quantification.
arXiv Detail & Related papers (2023-10-25T02:37:51Z) - When Rigidity Hurts: Soft Consistency Regularization for Probabilistic
Hierarchical Time Series Forecasting [69.30930115236228]
Probabilistic hierarchical time-series forecasting is an important variant of time-series forecasting.
Most methods focus on point predictions and do not provide well-calibrated probabilistic forecasts distributions.
We propose PROFHiT, a fully probabilistic hierarchical forecasting model that jointly models forecast distribution of entire hierarchy.
arXiv Detail & Related papers (2023-10-17T20:30:16Z) - When Rigidity Hurts: Soft Consistency Regularization for Probabilistic
Hierarchical Time Series Forecasting [69.30930115236228]
Probabilistic hierarchical time-series forecasting is an important variant of time-series forecasting.
Most methods focus on point predictions and do not provide well-calibrated probabilistic forecasts distributions.
We propose PROFHiT, a fully probabilistic hierarchical forecasting model that jointly models forecast distribution of entire hierarchy.
arXiv Detail & Related papers (2022-06-16T06:13:53Z) - TACTiS: Transformer-Attentional Copulas for Time Series [76.71406465526454]
estimation of time-varying quantities is a fundamental component of decision making in fields such as healthcare and finance.
We propose a versatile method that estimates joint distributions using an attention-based decoder.
We show that our model produces state-of-the-art predictions on several real-world datasets.
arXiv Detail & Related papers (2022-02-07T21:37:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.