HiMTM: Hierarchical Multi-Scale Masked Time Series Modeling with Self-Distillation for Long-Term Forecasting
- URL: http://arxiv.org/abs/2401.05012v2
- Date: Thu, 1 Aug 2024 09:18:17 GMT
- Title: HiMTM: Hierarchical Multi-Scale Masked Time Series Modeling with Self-Distillation for Long-Term Forecasting
- Authors: Shubao Zhao, Ming Jin, Zhaoxiang Hou, Chengyi Yang, Zengxiang Li, Qingsong Wen, Yi Wang,
- Abstract summary: HiMTM is a hierarchical multi-scale masked time series modeling with self-distillation for long-term forecasting.
HiMTM integrates four key components: (1) hierarchical multi-scale transformer (HMT) to capture temporal information at different scales; (2) decoupled encoder-decoder (DED) that directs the encoder towards feature extraction while the decoder focuses on pretext tasks.
Experiments on seven mainstream datasets show that HiMTM surpasses state-of-the-art self-supervised and end-to-end learning methods by a considerable margin of 3.16-68.54%.
- Score: 17.70984737213973
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Time series forecasting is a critical and challenging task in practical application. Recent advancements in pre-trained foundation models for time series forecasting have gained significant interest. However, current methods often overlook the multi-scale nature of time series, which is essential for accurate forecasting. To address this, we propose HiMTM, a hierarchical multi-scale masked time series modeling with self-distillation for long-term forecasting. HiMTM integrates four key components: (1) hierarchical multi-scale transformer (HMT) to capture temporal information at different scales; (2) decoupled encoder-decoder (DED) that directs the encoder towards feature extraction while the decoder focuses on pretext tasks; (3) hierarchical self-distillation (HSD) for multi-stage feature-level supervision signals during pre-training; and (4) cross-scale attention fine-tuning (CSA-FT) to capture dependencies between different scales for downstream tasks. These components collectively enhance multi-scale feature extraction in masked time series modeling, improving forecasting accuracy. Extensive experiments on seven mainstream datasets show that HiMTM surpasses state-of-the-art self-supervised and end-to-end learning methods by a considerable margin of 3.16-68.54\%. Additionally, HiMTM outperforms the latest robust self-supervised learning method, PatchTST, in cross-domain forecasting by a significant margin of 2.3\%. The effectiveness of HiMTM is further demonstrated through its application in natural gas demand forecasting.
Related papers
- xLSTM-Mixer: Multivariate Time Series Forecasting by Mixing via Scalar Memories [20.773694998061707]
Time series data is prevalent across numerous fields, necessitating the development of robust and accurate forecasting models.
We introduce xLSTM-Mixer, a model designed to effectively integrate temporal sequences, joint time-variable information, and multiple perspectives for robust forecasting.
Our evaluations demonstrate xLSTM-Mixer's superior long-term forecasting performance compared to recent state-of-the-art methods.
arXiv Detail & Related papers (2024-10-22T11:59:36Z) - UmambaTSF: A U-shaped Multi-Scale Long-Term Time Series Forecasting Method Using Mamba [7.594115034632109]
We propose UmambaTSF, a novel long-term time series forecasting framework.
It integrates multi-scale feature extraction capabilities of U-shaped encoder-decoder multilayer perceptrons (MLP) with Mamba's long sequence representation.
UmambaTSF achieves state-of-the-art performance and excellent generality on widely used benchmark datasets.
arXiv Detail & Related papers (2024-10-15T04:56:43Z) - MGCP: A Multi-Grained Correlation based Prediction Network for Multivariate Time Series [54.91026286579748]
We propose a Multi-Grained Correlations-based Prediction Network.
It simultaneously considers correlations at three levels to enhance prediction performance.
It employs adversarial training with an attention mechanism-based predictor and conditional discriminator to optimize prediction results at coarse-grained level.
arXiv Detail & Related papers (2024-05-30T03:32:44Z) - Frequency-domain MLPs are More Effective Learners in Time Series
Forecasting [67.60443290781988]
Time series forecasting has played the key role in different industrial domains, including finance, traffic, energy, and healthcare.
Most-based forecasting methods suffer from the point-wise mappings and information bottleneck.
We propose FreTS, a simple yet effective architecture built upon Frequency-domains for Time Series forecasting.
arXiv Detail & Related papers (2023-11-10T17:05:13Z) - Time-LLM: Time Series Forecasting by Reprogramming Large Language Models [110.20279343734548]
Time series forecasting holds significant importance in many real-world dynamic systems.
We present Time-LLM, a reprogramming framework to repurpose large language models for time series forecasting.
Time-LLM is a powerful time series learner that outperforms state-of-the-art, specialized forecasting models.
arXiv Detail & Related papers (2023-10-03T01:31:25Z) - FormerTime: Hierarchical Multi-Scale Representations for Multivariate
Time Series Classification [53.55504611255664]
FormerTime is a hierarchical representation model for improving the classification capacity for the multivariate time series classification task.
It exhibits three aspects of merits: (1) learning hierarchical multi-scale representations from time series data, (2) inheriting the strength of both transformers and convolutional networks, and (3) tacking the efficiency challenges incurred by the self-attention mechanism.
arXiv Detail & Related papers (2023-02-20T07:46:14Z) - MTS-Mixers: Multivariate Time Series Forecasting via Factorized Temporal
and Channel Mixing [18.058617044421293]
This paper investigates the contributions and deficiencies of attention mechanisms on the performance of time series forecasting.
We propose MTS-Mixers, which use two factorized modules to capture temporal and channel dependencies.
Experimental results on several real-world datasets show that MTS-Mixers outperform existing Transformer-based models with higher efficiency.
arXiv Detail & Related papers (2023-02-09T08:52:49Z) - Ti-MAE: Self-Supervised Masked Time Series Autoencoders [16.98069693152999]
We propose a novel framework named Ti-MAE, in which the input time series are assumed to follow an integrate distribution.
Ti-MAE randomly masks out embedded time series data and learns an autoencoder to reconstruct them at the point-level.
Experiments on several public real-world datasets demonstrate that our framework of masked autoencoding could learn strong representations directly from the raw data.
arXiv Detail & Related papers (2023-01-21T03:20:23Z) - Generative Time Series Forecasting with Diffusion, Denoise, and
Disentanglement [51.55157852647306]
Time series forecasting has been a widely explored task of great importance in many applications.
It is common that real-world time series data are recorded in a short time period, which results in a big gap between the deep model and the limited and noisy time series.
We propose to address the time series forecasting problem with generative modeling and propose a bidirectional variational auto-encoder equipped with diffusion, denoise, and disentanglement.
arXiv Detail & Related papers (2023-01-08T12:20:46Z) - An Unsupervised Short- and Long-Term Mask Representation for
Multivariate Time Series Anomaly Detection [2.387411589813086]
This paper proposes an anomaly detection method based on unsupervised Short- and Long-term Mask Representation learning (SLMR)
Experiments show that the performance of our method outperforms other state-of-the-art models on three real-world datasets.
arXiv Detail & Related papers (2022-08-19T09:34:11Z) - Multi-scale Attention Flow for Probabilistic Time Series Forecasting [68.20798558048678]
We propose a novel non-autoregressive deep learning model, called Multi-scale Attention Normalizing Flow(MANF)
Our model avoids the influence of cumulative error and does not increase the time complexity.
Our model achieves state-of-the-art performance on many popular multivariate datasets.
arXiv Detail & Related papers (2022-05-16T07:53:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.