Related papers: A Comparative Study of Pruning Methods in Transformer-based Time Series Forecasting

A Comparative Study of Pruning Methods in Transformer-based Time Series Forecasting

URL: http://arxiv.org/abs/2412.12883v1
Date: Tue, 17 Dec 2024 13:07:31 GMT
Title: A Comparative Study of Pruning Methods in Transformer-based Time Series Forecasting
Authors: Nicholas Kiefer, Arvid Weyrauch, Muhammed Öz, Achim Streit, Markus Götz, Charlotte Debus,
Abstract summary: Pruning is an established approach to reduce neural network parameter count and save compute.<n>We study the effects of these pruning strategies on model predictive performance and computational aspects like model size, operations, and inference time.<n>We demonstrate that even with corresponding hardware and software support, structured pruning is unable to provide significant time savings.
Score: 0.07916635054977067
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The current landscape in time-series forecasting is dominated by Transformer-based models. Their high parameter count and corresponding demand in computational resources pose a challenge to real-world deployment, especially for commercial and scientific applications with low-power embedded devices. Pruning is an established approach to reduce neural network parameter count and save compute. However, the implications and benefits of pruning Transformer-based models for time series forecasting are largely unknown. To close this gap, we provide a comparative benchmark study by evaluating unstructured and structured pruning on various state-of-the-art multivariate time series models. We study the effects of these pruning strategies on model predictive performance and computational aspects like model size, operations, and inference time. Our results show that certain models can be pruned even up to high sparsity levels, outperforming their dense counterpart. However, fine-tuning pruned models is necessary. Furthermore, we demonstrate that even with corresponding hardware and software support, structured pruning is unable to provide significant time savings.

Related papers

Does Scaling Law Apply in Time Series Forecasting? [2.127584662240465]
We propose Alinear, an ultra-lightweight forecasting model that achieves competitive performance using only k-level parameters.<n>Experiments on seven benchmark datasets demonstrate that Alinear consistently outperforms large-scale models.<n>This work challenges the prevailing belief that larger models are inherently better and suggests a paradigm shift toward more efficient time series modeling.
arXiv Detail & Related papers (2025-05-15T11:04:39Z)
Powerformer: A Transformer with Weighted Causal Attention for Time-series Forecasting [50.298817606660826]
We introduce Powerformer, a novel Transformer variant that replaces noncausal attention weights with causal weights that are reweighted according to a smooth heavy-tailed decay. Our empirical results demonstrate that Powerformer achieves state-of-the-art accuracy on public time-series benchmarks. Our analyses show that the model's locality bias is amplified during training, demonstrating an interplay between time-series data and power-law-based attention.
arXiv Detail & Related papers (2025-02-10T04:42:11Z)
Ister: Inverted Seasonal-Trend Decomposition Transformer for Explainable Multivariate Time Series Forecasting [10.32586981170693]
Inverted Seasonal-Trend Decomposition Transformer (Ister) We introduce a novel Dot-attention mechanism that improves interpretability, computational efficiency, and predictive accuracy. Ister enables intuitive visualization of component contributions, shedding lights on model's decision process and enhancing transparency in prediction results.
arXiv Detail & Related papers (2024-12-25T06:37:19Z)
A Cost-Aware Approach to Adversarial Robustness in Neural Networks [1.622320874892682]
We propose using accelerated failure time models to measure the effect of hardware choice, batch size, number of epochs, and test-set accuracy. We evaluate several GPU types and use the Tree Parzen Estimator to maximize model robustness and minimize model run-time simultaneously.
arXiv Detail & Related papers (2024-09-11T20:43:59Z)
Are Self-Attentions Effective for Time Series Forecasting? [4.990206466948269]
Time series forecasting is crucial for applications across multiple domains and various scenarios. Recent findings have indicated that simpler linear models might outperform complex Transformer-based approaches. We introduce a new architecture, Cross-Attention-only Time Series transformer (CATS) Our model achieves superior performance with the lowest mean squared error and uses fewer parameters compared to existing models.
arXiv Detail & Related papers (2024-05-27T06:49:39Z)
Timer: Generative Pre-trained Transformers Are Large Time Series Models [83.03091523806668]
This paper aims at the early development of large time series models (LTSM) During pre-training, we curate large-scale datasets with up to 1 billion time points. To meet diverse application needs, we convert forecasting, imputation, and anomaly detection of time series into a unified generative task.
arXiv Detail & Related papers (2024-02-04T06:55:55Z)
Parsimony or Capability? Decomposition Delivers Both in Long-term Time Series Forecasting [46.63798583414426]
Long-term time series forecasting (LTSF) represents a critical frontier in time series analysis. Our study demonstrates, through both analytical and empirical evidence, that decomposition is key to containing excessive model inflation. Remarkably, by tailoring decomposition to the intrinsic dynamics of time series data, our proposed model outperforms existing benchmarks.
arXiv Detail & Related papers (2024-01-22T13:15:40Z)
MPR-Net:Multi-Scale Pattern Reproduction Guided Universality Time Series Interpretable Forecasting [13.790498420659636]
Time series forecasting has received wide interest from existing research due to its broad applications inherent challenging. This paper proposes a forecasting model, MPR-Net. It first adaptively decomposes multi-scale historical series patterns using convolution operation, then constructs a pattern extension forecasting method based on the prior knowledge of pattern reproduction, and finally reconstructs future patterns into future series using deconvolution operation. By leveraging the temporal dependencies present in the time series, MPR-Net not only achieves linear time complexity, but also makes the forecasting process interpretable.
arXiv Detail & Related papers (2023-07-13T13:16:01Z)
Two Steps Forward and One Behind: Rethinking Time Series Forecasting with Deep Learning [7.967995669387532]
The Transformer is a highly successful deep learning model that has revolutionised the world of artificial neural networks. We investigate the effectiveness of Transformer-based models applied to the domain of time series forecasting. We propose a set of alternative models that are better performing and significantly less complex.
arXiv Detail & Related papers (2023-04-10T12:47:42Z)
Towards Long-Term Time-Series Forecasting: Feature, Pattern, and Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning. Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism. We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z)
TACTiS: Transformer-Attentional Copulas for Time Series [76.71406465526454]
estimation of time-varying quantities is a fundamental component of decision making in fields such as healthcare and finance. We propose a versatile method that estimates joint distributions using an attention-based decoder. We show that our model produces state-of-the-art predictions on several real-world datasets.
arXiv Detail & Related papers (2022-02-07T21:37:29Z)
Anomaly Detection of Time Series with Smoothness-Inducing Sequential Variational Auto-Encoder [59.69303945834122]
We present a Smoothness-Inducing Sequential Variational Auto-Encoder (SISVAE) model for robust estimation and anomaly detection of time series. Our model parameterizes mean and variance for each time-stamp with flexible neural networks. We show the effectiveness of our model on both synthetic datasets and public real-world benchmarks.
arXiv Detail & Related papers (2021-02-02T06:15:15Z)
Transformer Hawkes Process [79.16290557505211]
We propose a Transformer Hawkes Process (THP) model, which leverages the self-attention mechanism to capture long-term dependencies. THP outperforms existing models in terms of both likelihood and event prediction accuracy by a notable margin. We provide a concrete example, where THP achieves improved prediction performance for learning multiple point processes when incorporating their relational information.
arXiv Detail & Related papers (2020-02-21T13:48:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.