DDT: A Dual-Masking Dual-Expert Transformer for Energy Time-Series Forecasting
- URL: http://arxiv.org/abs/2601.07250v1
- Date: Mon, 12 Jan 2026 06:36:36 GMT
- Title: DDT: A Dual-Masking Dual-Expert Transformer for Energy Time-Series Forecasting
- Authors: Mingnan Zhu, Qixuan Zhang, Yixuan Cheng, Fangzhou Gu, Shiming Lin,
- Abstract summary: We propose DDT, a novel and robust deep learning framework for high-precision time-series forecasting.<n>At its core, DDT introduces two key innovations. First, we design a dual-masking mechanism that synergistically combines a strict causal mask with a data-driven dynamic mask.<n>Second, our architecture features a dual-expert system that decouples the modeling of temporal dynamics and cross-variable correlations into parallel, specialized pathways.
- Score: 3.877294667255643
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate energy time-series forecasting is crucial for ensuring grid stability and promoting the integration of renewable energy, yet it faces significant challenges from complex temporal dependencies and the heterogeneity of multi-source data. To address these issues, we propose DDT, a novel and robust deep learning framework for high-precision time-series forecasting. At its core, DDT introduces two key innovations. First, we design a dual-masking mechanism that synergistically combines a strict causal mask with a data-driven dynamic mask. This novel design ensures theoretical causal consistency while adaptively focusing on the most salient historical information, overcoming the rigidity of traditional masking techniques. Second, our architecture features a dual-expert system that decouples the modeling of temporal dynamics and cross-variable correlations into parallel, specialized pathways, which are then intelligently integrated through a dynamic gated fusion module. We conducted extensive experiments on 7 challenging energy benchmark datasets, including ETTh, Electricity, and Solar. The results demonstrate that DDT consistently outperforms strong state-of-the-art baselines across all prediction horizons, establishing a new benchmark for the task.
Related papers
- FusAD: Time-Frequency Fusion with Adaptive Denoising for General Time Series Analysis [92.23551599659186]
Time series analysis plays a vital role in fields such as finance, healthcare, industry, and meteorology.<n>FusAD is a unified analysis framework designed for diverse time series tasks.
arXiv Detail & Related papers (2025-12-16T04:34:27Z) - HAD: Hierarchical Asymmetric Distillation to Bridge Spatio-Temporal Gaps in Event-Based Object Tracking [80.07224739976911]
Event cameras offer exceptional temporal resolution and a range (modal)<n> RGB cameras excel at capturing rich texture with high resolution, whereas event cameras offer exceptional temporal resolution and a range (modal)
arXiv Detail & Related papers (2025-10-22T13:15:13Z) - Multi-modal Spatio-Temporal Transformer for High-resolution Land Subsidence Prediction [3.3295066998131637]
We propose a novel framework that fuses dynamic displacement data with static physical priors.<n>On the public EGMS dataset, MM-STT establishes a new state-of-the-art, reducing the long-range forecast RMSE by an order of high magnitude.
arXiv Detail & Related papers (2025-09-29T18:49:04Z) - Wavelet-Enhanced Neural ODE and Graph Attention for Interpretable Energy Forecasting [0.0]
This paper introduces a neural framework that integrates continuous-time Neural Ordinary Differential Equations (Neural ODEs) and graph attention.<n>It adeptly captures and models diverse, multi-scale temporal dynamics.<n>The model enhances interpretability through SHAP analysis, making it suitable for sustainable energy applications.
arXiv Detail & Related papers (2025-07-14T10:23:18Z) - Enhanced Photovoltaic Power Forecasting: An iTransformer and LSTM-Based Model Integrating Temporal and Covariate Interactions [16.705621552594643]
Existing models often struggle with capturing the complex relationships between target variables and covariates.<n>We propose a novel model architecture that leverages the iTransformer for feature extraction from target variables.<n>A cross-attention mechanism is integrated to fuse the outputs of both models, followed by a Kolmogorov-Arnold network mapping.<n>Results demonstrate that the proposed model effectively capture seasonal variations in PV power generation and improve forecasting accuracy.
arXiv Detail & Related papers (2024-12-03T09:16:13Z) - EnergyDiff: Universal Time-Series Energy Data Generation using Diffusion Models [2.677325229270716]
High-resolution time series data are crucial for the operation and planning of energy systems.<n>High-resolution time series data is difficult to model due to its inherent high dimensionality and complex temporal dependencies.<n>We propose EnergyDiff, a universal data generation framework for energy time series data.
arXiv Detail & Related papers (2024-07-18T14:10:50Z) - Attractor Memory for Long-Term Time Series Forecasting: A Chaos Perspective [63.60312929416228]
textbftextitAttraos incorporates chaos theory into long-term time series forecasting.
We show that Attraos outperforms various LTSF methods on mainstream datasets and chaotic datasets with only one-twelfth of the parameters compared to PatchTST.
arXiv Detail & Related papers (2024-02-18T05:35:01Z) - Long-term Wind Power Forecasting with Hierarchical Spatial-Temporal
Transformer [112.12271800369741]
Wind power is attracting increasing attention around the world due to its renewable, pollution-free, and other advantages.
Accurate wind power forecasting (WPF) can effectively reduce power fluctuations in power system operations.
Existing methods are mainly designed for short-term predictions and lack effective spatial-temporal feature augmentation.
arXiv Detail & Related papers (2023-05-30T04:03:15Z) - Towards Long-Term Time-Series Forecasting: Feature, Pattern, and
Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning.
Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism.
We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z) - Edge Continual Learning for Dynamic Digital Twins over Wireless Networks [68.65520952712914]
Digital twins (DTs) constitute a critical link between the real-world and the metaverse.
In this paper, a novel edge continual learning framework is proposed to accurately model the evolving affinity between a physical twin and its corresponding cyber twin.
The proposed framework achieves a simultaneously accurate and synchronous CT model that is robust to catastrophic forgetting.
arXiv Detail & Related papers (2022-04-10T23:25:37Z) - Combining Embeddings and Fuzzy Time Series for High-Dimensional Time
Series Forecasting in Internet of Energy Applications [0.0]
Fuzzy Time Series (FTS) models stand out as data-driven non-parametric models of easy implementation and high accuracy.
We present a new methodology for handling high-dimensional time series, by projecting the original high-dimensional data into a low dimensional embedding space.
arXiv Detail & Related papers (2021-12-03T19:50:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.