Learning Temporal Saliency for Time Series Forecasting with Cross-Scale Attention
- URL: http://arxiv.org/abs/2509.22839v1
- Date: Fri, 26 Sep 2025 18:43:51 GMT
- Title: Learning Temporal Saliency for Time Series Forecasting with Cross-Scale Attention
- Authors: Ibrahim Delibasoglu, Fredrik Heintz,
- Abstract summary: We present CrossScaleNet, an innovative architecture that combines a patch-based cross-attention mechanism with multi-scale processing.<n>Our evaluations demonstrate superior performance in both temporal saliency detection and forecasting accuracy.
- Score: 5.992220383989106
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Explainability in time series forecasting is essential for improving model transparency and supporting informed decision-making. In this work, we present CrossScaleNet, an innovative architecture that combines a patch-based cross-attention mechanism with multi-scale processing to achieve both high performance and enhanced temporal explainability. By embedding attention mechanisms into the training process, our model provides intrinsic explainability for temporal saliency, making its decision-making process more transparent. Traditional post-hoc methods for temporal saliency detection are computationally expensive, particularly when compared to feature importance detection. While ablation techniques may suffice for datasets with fewer features, identifying temporal saliency poses greater challenges due to its complexity. We validate CrossScaleNet on synthetic datasets with known saliency ground truth and on established public benchmarks, demonstrating the robustness of our method in identifying temporal saliency. Experiments on real-world datasets for forecasting task show that our approach consistently outperforms most transformer-based models, offering better explainability without sacrificing predictive accuracy. Our evaluations demonstrate superior performance in both temporal saliency detection and forecasting accuracy. Moreover, we highlight that existing models claiming explainability often fail to maintain strong performance on standard benchmarks. CrossScaleNet addresses this gap, offering a balanced approach that captures temporal saliency effectively while delivering state-of-the-art forecasting performance across datasets of varying complexity.
Related papers
- StepVAR: Structure-Texture Guided Pruning for Visual Autoregressive Models [98.72926158261937]
We propose a training-free token pruning framework for Visual AutoRegressive models.<n>We employ a lightweight high-pass filter to capture local texture details, while leveraging Principal Component Analysis (PCA) to preserve global structural information.<n>To maintain valid next-scale prediction under sparse tokens, we introduce a nearest neighbor feature propagation strategy.
arXiv Detail & Related papers (2026-03-02T11:35:05Z) - Demystifying Data-Driven Probabilistic Medium-Range Weather Forecasting [63.8116386935854]
We demonstrate that state-of-the-art probabilistic skill requires neither intricate architectural constraints nor specialized trainings.<n>We introduce a scalable framework for learning multi-scale atmospheric dynamics by combining a directly downsampled latent space with a history-conditioned local projector.<n>We find that our framework design is robust to the choice of probabilistic estimators, seamlessly supporting interpolants, diffusion models, and CRPS-based ensemble training.
arXiv Detail & Related papers (2026-01-26T03:52:16Z) - Feature-aware Modulation for Learning from Temporal Tabular Data [47.36022303401642]
temporal distribution shifts pose significant challenges in real-world deployment.<n> Static models assume fixed mappings to ensure generalization, whereas adaptive models may overfit to transient patterns.<n>We propose a feature-aware temporal modulation mechanism that conditions feature representations on temporal context.
arXiv Detail & Related papers (2025-12-03T11:13:12Z) - Abstain Mask Retain Core: Time Series Prediction by Adaptive Masking Loss with Representation Consistency [4.047219770183742]
Time series forecasting plays a pivotal role in critical domains such as energy management and financial markets.<n>This study reveals a counterintuitive phenomenon: appropriately truncating historical data can enhance prediction accuracy.<n>We propose an innovative solution termed Adaptive Masking Loss with Representation Consistency.
arXiv Detail & Related papers (2025-10-22T19:23:53Z) - Beyond Model Ranking: Predictability-Aligned Evaluation for Time Series Forecasting [18.018179328110048]
We introduce a predictability-aligned diagnostic framework grounded in spectral coherence.<n>We provide the first systematic evidence of "predictability drift", demonstrating that a task's forecasting difficulty varies sharply over time.<n>Our evaluation reveals a key architectural trade-off: complex models are superior for low-predictability data, whereas linear models are highly effective on more predictable tasks.
arXiv Detail & Related papers (2025-09-27T02:56:06Z) - Tuning for Trustworthiness -- Balancing Performance and Explanation Consistency in Neural Network Optimization [49.567092222782435]
We introduce the novel concept of XAI consistency, defined as the agreement among different feature attribution methods.<n>We create a multi-objective optimization framework that balances predictive performance with explanation.<n>Our research provides a foundation for future investigations into whether models from the trade-off zone-balancing performance loss and XAI consistency-exhibit greater robustness.
arXiv Detail & Related papers (2025-05-12T13:19:14Z) - Generative QoE Modeling: A Lightweight Approach for Telecom Networks [6.473372512447993]
This study introduces a lightweight generative modeling framework that balances computational efficiency, interpretability, and predictive accuracy.<n>By validating the use of Vector Quantization (VQ) as a preprocessing technique, continuous network features are effectively transformed into discrete categorical symbols.<n>This VQ-HMM pipeline enhances the model's capacity to capture dynamic QoE patterns while supporting probabilistic inference on new and unseen data.
arXiv Detail & Related papers (2025-04-30T06:19:37Z) - Improving Network Interpretability via Explanation Consistency Evaluation [56.14036428778861]
We propose a framework that acquires more explainable activation heatmaps and simultaneously increase the model performance.
Specifically, our framework introduces a new metric, i.e., explanation consistency, to reweight the training samples adaptively in model learning.
Our framework then promotes the model learning by paying closer attention to those training samples with a high difference in explanations.
arXiv Detail & Related papers (2024-08-08T17:20:08Z) - Attention as Robust Representation for Time Series Forecasting [23.292260325891032]
Time series forecasting is essential for many practical applications.
Transformers' key feature, the attention mechanism, dynamically fusing embeddings to enhance data representation, often relegating attention weights to a byproduct role.
Our approach elevates attention weights as the primary representation for time series, capitalizing on the temporal relationships among data points to improve forecasting accuracy.
arXiv Detail & Related papers (2024-02-08T03:00:50Z) - Distillation Enhanced Time Series Forecasting Network with Momentum Contrastive Learning [7.4106801792345705]
We propose DE-TSMCL, an innovative distillation enhanced framework for long sequence time series forecasting.
Specifically, we design a learnable data augmentation mechanism which adaptively learns whether to mask a timestamp.
Then, we propose a contrastive learning task with momentum update to explore inter-sample and intra-temporal correlations of time series.
By developing model loss from multiple tasks, we can learn effective representations for downstream forecasting task.
arXiv Detail & Related papers (2024-01-31T12:52:10Z) - Parsimony or Capability? Decomposition Delivers Both in Long-term Time Series Forecasting [46.63798583414426]
Long-term time series forecasting (LTSF) represents a critical frontier in time series analysis.
Our study demonstrates, through both analytical and empirical evidence, that decomposition is key to containing excessive model inflation.
Remarkably, by tailoring decomposition to the intrinsic dynamics of time series data, our proposed model outperforms existing benchmarks.
arXiv Detail & Related papers (2024-01-22T13:15:40Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.