SEER: Transformer-based Robust Time Series Forecasting via Automated Patch Enhancement and Replacement
- URL: http://arxiv.org/abs/2602.00589v1
- Date: Sat, 31 Jan 2026 08:12:24 GMT
- Title: SEER: Transformer-based Robust Time Series Forecasting via Automated Patch Enhancement and Replacement
- Authors: Xiangfei Qiu, Xvyuan Liu, Tianen Shen, Xingjian Wu, Hanyin Cheng, Bin Yang, Jilin Hu,
- Abstract summary: This study proposes a robust time series forecasting framework called SEER.<n>In real-world time series, there are often low-quality issues during data collection, such as missing values, distribution shifts, anomalies and white noise.<n>We introduce a Learnable Patch Replacement Module, which enhances forecasting robustness and model accuracy through a two-stage process.
- Score: 9.482558107303646
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Time series forecasting is important in many fields that require accurate predictions for decision-making. Patching techniques, commonly used and effective in time series modeling, help capture temporal dependencies by dividing the data into patches. However, existing patch-based methods fail to dynamically select patches and typically use all patches during the prediction process. In real-world time series, there are often low-quality issues during data collection, such as missing values, distribution shifts, anomalies and white noise, which may cause some patches to contain low-quality information, negatively impacting the prediction results. To address this issue, this study proposes a robust time series forecasting framework called SEER. Firstly, we propose an Augmented Embedding Module, which improves patch-wise representations using a Mixture-of-Experts (MoE) architecture and obtains series-wise token representations through a channel-adaptive perception mechanism. Secondly, we introduce a Learnable Patch Replacement Module, which enhances forecasting robustness and model accuracy through a two-stage process: 1) a dynamic filtering mechanism eliminates negative patch-wise tokens; 2) a replaced attention module substitutes the identified low-quality patches with global series-wise token, further refining their representations through a causal attention mechanism. Comprehensive experimental results demonstrate the SOTA performance of SEER.
Related papers
- MEMTS: Internalizing Domain Knowledge via Parameterized Memory for Retrieval-Free Domain Adaptation of Time Series Foundation Models [51.506429027626005]
Memory for Time Series (MEMTS) is a lightweight and plug-and-play method for retrieval-free domain adaptation in time series forecasting.<n>Key component of MEMTS is a Knowledge Persistence Module (KPM), which internalizes domain-specific temporal dynamics.<n>This paradigm shift enables MEMTS to achieve accurate domain adaptation with constant-time inference and near-zero latency.
arXiv Detail & Related papers (2026-02-14T14:00:06Z) - GateRA: Token-Aware Modulation for Parameter-Efficient Fine-Tuning [51.79350934271497]
GateRA is a unified framework that introduces token-aware modulation to dynamically adjust the strength of PEFT updates.<n>By incorporating adaptive gating into standard PEFT branches, GateRA enables selective, token-level adaptation.<n> Experiments on multiple commonsense reasoning benchmarks demonstrate that GateRA consistently outperforms or matches prior PEFT methods.
arXiv Detail & Related papers (2025-11-15T17:55:47Z) - EntroPE: Entropy-Guided Dynamic Patch Encoder for Time Series Forecasting [50.794700596484894]
We propose EntroPE (Entropy-Guided Dynamic Patch), a novel, temporally informed framework that dynamically detects transition points via conditional entropy.<n>This preserves temporal structure while retaining the computational benefits of patching.<n> Experiments across long-term forecasting benchmarks demonstrate that EntroPE improves both accuracy and efficiency.
arXiv Detail & Related papers (2025-09-30T12:09:56Z) - Revisiting Multivariate Time Series Forecasting with Missing Values [65.30332997607141]
Missing values are common in real-world time series.<n>Current approaches have developed an imputation-then-prediction framework that uses imputation modules to fill in missing values, followed by forecasting on the imputed data.<n>This framework overlooks a critical issue: there is no ground truth for the missing values, making the imputation process susceptible to errors that can degrade prediction accuracy.<n>We introduce Consistency-Regularized Information Bottleneck (CRIB), a novel framework built on the Information Bottleneck principle.
arXiv Detail & Related papers (2025-09-27T20:57:48Z) - Rethinking Irregular Time Series Forecasting: A Simple yet Effective Baseline [12.66709671516384]
We introduce APN, a general and efficient forecasting framework.<n>At the core of APN is a novel Time-Aware Patch Aggregation (ATAPA) module.<n>It computes patch representations via a time-aware weighted aggregation of all raw observations.<n>This approach provides two key advantages: it preserves data fidelity by avoiding the introduction of artificial data points and ensures complete information coverage by design.
arXiv Detail & Related papers (2025-05-16T13:42:00Z) - Enhancing Masked Time-Series Modeling via Dropping Patches [10.715930488118582]
This paper explores how to enhance existing masked time-series modeling by randomly dropping sub-sequence level patches of time series.<n>The method named DropPatch is proposed, which improves the pre-training efficiency by a square-level advantage.<n>It provides additional advantages for modeling in scenarios such as in-domain, cross-domain, few-shot learning and cold start.
arXiv Detail & Related papers (2024-12-19T17:21:34Z) - Learning Pattern-Specific Experts for Time Series Forecasting Under Patch-level Distribution Shift [51.01356105618118]
Time series often exhibit complex non-uniform distribution with varying patterns across segments, such as season, operating condition, or semantic meaning.<n>Existing approaches, which typically train a single model to capture all these diverse patterns, often struggle with the pattern drifts between patches.<n>We propose TFPS, a novel architecture that leverages pattern-specific experts for more accurate and adaptable time series forecasting.
arXiv Detail & Related papers (2024-10-13T13:35:29Z) - Temporal Feature Matters: A Framework for Diffusion Model Quantization [105.3033493564844]
Diffusion models rely on the time-step for the multi-round denoising.<n>We introduce a novel quantization framework that includes three strategies.<n>This framework preserves most of the temporal information and ensures high-quality end-to-end generation.
arXiv Detail & Related papers (2024-07-28T17:46:15Z) - Attention as Robust Representation for Time Series Forecasting [23.292260325891032]
Time series forecasting is essential for many practical applications.
Transformers' key feature, the attention mechanism, dynamically fusing embeddings to enhance data representation, often relegating attention weights to a byproduct role.
Our approach elevates attention weights as the primary representation for time series, capitalizing on the temporal relationships among data points to improve forecasting accuracy.
arXiv Detail & Related papers (2024-02-08T03:00:50Z) - Learning to Embed Time Series Patches Independently [5.752266579415516]
Masked time series modeling has recently gained much attention as a self-supervised representation learning strategy for time series.
We argue that capturing such patch might not be an optimal strategy for time series representation learning.
We propose to use 1) the simple patch reconstruction task, which autoencode each patch without looking at other patches, and 2) the simple patch-wise reconstruction that embeds each patch independently.
arXiv Detail & Related papers (2023-12-27T06:23:29Z) - Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift.
We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness.
The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.