Related papers: Towards Lightweight Time Series Forecasting: a Patch-wise Transformer with Weak Data Enriching

Towards Lightweight Time Series Forecasting: a Patch-wise Transformer with Weak Data Enriching

URL: http://arxiv.org/abs/2501.10448v1
Date: Tue, 14 Jan 2025 13:35:03 GMT
Title: Towards Lightweight Time Series Forecasting: a Patch-wise Transformer with Weak Data Enriching
Authors: Meng Wang, Jintao Yang, Bin Yang, Hui Li, Tongxin Gong, Bo Yang, Jiangtao Cui,
Abstract summary: LiPFormer is a lightweight Patch-wise Transformer with weak data enriching.<n>It outperforms state-of-the-art methods in accuracy, while significantly reducing parameter scale, training duration, and GPU memory usage.<n> Deployment on an edge device reveals that LiPFormer takes only 1/3 inference time compared to classic Transformers.
Score: 17.48605847953516
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Patch-wise Transformer based time series forecasting achieves superior accuracy. However, this superiority relies heavily on intricate model design with massive parameters, rendering both training and inference expensive, thus preventing their deployments on edge devices with limited resources and low latency requirements. In addition, existing methods often work in an autoregressive manner, which take into account only historical values, but ignore valuable, easy-to-obtain context information, such as weather forecasts, date and time of day. To contend with the two limitations, we propose LiPFormer, a novel Lightweight Patch-wise Transformer with weak data enriching. First, to simplify the Transformer backbone, LiPFormer employs a novel lightweight cross-patch attention and a linear transformation-based attention to eliminate Layer Normalization and Feed Forward Network, two heavy components in existing Transformers. Second, we propose a lightweight, weak data enriching module to provide additional, valuable weak supervision to the training. It enhances forecasting accuracy without significantly increasing model complexity as it does not involve expensive, human-labeling but using easily accessible context information. This facilitates the weak data enriching to plug-and-play on existing models. Extensive experiments on nine benchmark time series datasets demonstrate that LiPFormer outperforms state-of-the-art methods in accuracy, while significantly reducing parameter scale, training duration, and GPU memory usage. Deployment on an edge device reveals that LiPFormer takes only 1/3 inference time compared to classic Transformers. In addition, we demonstrate that the weak data enriching can integrate seamlessly into various Transformer based models to enhance their accuracy, suggesting its generality.

Related papers

Powerformer: A Transformer with Weighted Causal Attention for Time-series Forecasting [50.298817606660826]
We introduce Powerformer, a novel Transformer variant that replaces noncausal attention weights with causal weights that are reweighted according to a smooth heavy-tailed decay. Our empirical results demonstrate that Powerformer achieves state-of-the-art accuracy on public time-series benchmarks. Our analyses show that the model's locality bias is amplified during training, demonstrating an interplay between time-series data and power-law-based attention.
arXiv Detail & Related papers (2025-02-10T04:42:11Z)
sTransformer: A Modular Approach for Extracting Inter-Sequential and Temporal Information for Time-Series Forecasting [6.434378359932152]
We review and categorize existing Transformer-based models into two main types: (1) modifications to the model structure and (2) modifications to the input data. We propose $textbfsTransformer$, which introduces the Sequence and Temporal Convolutional Network (STCN) to fully capture both sequential and temporal information. We compare our model with linear models and existing forecasting models on long-term time-series forecasting, achieving new state-of-the-art results.
arXiv Detail & Related papers (2024-08-19T06:23:41Z)
Unlocking the Power of Patch: Patch-Based MLP for Long-Term Time Series Forecasting [0.0]
Recent studies have attempted to refine the Transformer architecture to demonstrate its effectiveness in Long-Term Time Series Forecasting tasks.<n>We attribute the effectiveness of these models largely to the adopted Patch mechanism.<n>We propose a novel and simple Patch-based components (PatchMLP) for LTSF tasks.
arXiv Detail & Related papers (2024-05-22T12:12:20Z)
SAMformer: Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise Attention [14.672072173674039]
We show that transformers are incapable of converging to their true solution despite their high expressive power. We propose a shallow lightweight transformer model that escapes bad local minima when optimized with sharpness-aware optimization. In particular, SAMformer surpasses current state-of-the-art methods and is on par with the biggest foundation model MOIRAI while having significantly fewer parameters.
arXiv Detail & Related papers (2024-02-15T18:55:05Z)
Adaptive Point Transformer [88.28498667506165]
Adaptive Point Cloud Transformer (AdaPT) is a standard PT model augmented by an adaptive token selection mechanism. AdaPT dynamically reduces the number of tokens during inference, enabling efficient processing of large point clouds.
arXiv Detail & Related papers (2024-01-26T13:24:45Z)
Parsimony or Capability? Decomposition Delivers Both in Long-term Time Series Forecasting [46.63798583414426]
Long-term time series forecasting (LTSF) represents a critical frontier in time series analysis. Our study demonstrates, through both analytical and empirical evidence, that decomposition is key to containing excessive model inflation. Remarkably, by tailoring decomposition to the intrinsic dynamics of time series data, our proposed model outperforms existing benchmarks.
arXiv Detail & Related papers (2024-01-22T13:15:40Z)
Client: Cross-variable Linear Integrated Enhanced Transformer for Multivariate Long-Term Time Series Forecasting [4.004869317957185]
"Cross-variable Linear Integrated ENhanced Transformer for Multivariable Long-Term Time Series Forecasting" (Client) is an advanced model that outperforms both traditional Transformer-based models and linear models. Client incorporates non-linearity and cross-variable dependencies, which sets it apart from conventional linear models and Transformer-based models.
arXiv Detail & Related papers (2023-05-30T08:31:22Z)
Towards Long-Term Time-Series Forecasting: Feature, Pattern, and Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning. Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism. We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z)
A Time Series is Worth 64 Words: Long-term Forecasting with Transformers [4.635547236305835]
We propose an efficient design of Transformer-based models for time series forecasting and self-supervised representation learning. It is based on two key components: (i) segmentation of time series into subseries-level patches which are served as input tokens to Transformer. PatchTST can improve the long-term forecasting accuracy significantly when compared with that of SOTA Transformer-based models.
arXiv Detail & Related papers (2022-11-27T05:15:42Z)
Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting [86.33543833145457]
We propose Non-stationary Transformers as a generic framework with two interdependent modules: Series Stationarization and De-stationary Attention. Our framework consistently boosts mainstream Transformers by a large margin, which reduces MSE by 49.43% on Transformer, 47.34% on Informer, and 46.89% on Reformer.
arXiv Detail & Related papers (2022-05-28T12:27:27Z)
Visformer: The Vision-friendly Transformer [105.52122194322592]
We propose a new architecture named Visformer, which is abbreviated from the Vision-friendly Transformer' With the same computational complexity, Visformer outperforms both the Transformer-based and convolution-based models in terms of ImageNet classification accuracy.
arXiv Detail & Related papers (2021-04-26T13:13:03Z)
Spatiotemporal Transformer for Video-based Person Re-identification [102.58619642363958]
We show that, despite the strong learning ability, the vanilla Transformer suffers from an increased risk of over-fitting. We propose a novel pipeline where the model is pre-trained on a set of synthesized video data and then transferred to the downstream domains. The derived algorithm achieves significant accuracy gain on three popular video-based person re-identification benchmarks.
arXiv Detail & Related papers (2021-03-30T16:19:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.