CNN-TFT explained by SHAP with multi-head attention weights for time series forecasting
- URL: http://arxiv.org/abs/2510.06840v1
- Date: Wed, 08 Oct 2025 10:08:28 GMT
- Title: CNN-TFT explained by SHAP with multi-head attention weights for time series forecasting
- Authors: Stefano F. Stefenon, João P. Matos-Carvalho, Valderi R. Q. Leithardt, Kin-Choong Yow,
- Abstract summary: This paper proposes a hybrid architecture integrating convolutional feature extraction with a temporal fusion transformer.<n>CNN-TFT outperforms well-established deep learning models, with a mean absolute percentage error of up to 2.2%.
- Score: 0.6019777076722422
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Convolutional neural networks (CNNs) and transformer architectures offer strengths for modeling temporal data: CNNs excel at capturing local patterns and translational invariances, while transformers effectively model long-range dependencies via self-attention. This paper proposes a hybrid architecture integrating convolutional feature extraction with a temporal fusion transformer (TFT) backbone to enhance multivariate time series forecasting. The CNN module first applies a hierarchy of one-dimensional convolutional layers to distill salient local patterns from raw input sequences, reducing noise and dimensionality. The resulting feature maps are then fed into the TFT, which applies multi-head attention to capture both short- and long-term dependencies and to weigh relevant covariates adaptively. We evaluate the CNN-TFT on a hydroelectric natural flow time series dataset. Experimental results demonstrate that CNN-TFT outperforms well-established deep learning models, with a mean absolute percentage error of up to 2.2%. The explainability of the model is obtained by a proposed Shapley additive explanations with multi-head attention weights (SHAP-MHAW). Our novel architecture, named CNN-TFT-SHAP-MHAW, is promising for applications requiring high-fidelity, multivariate time series forecasts, being available for future analysis at https://github.com/SFStefenon/CNN-TFT-SHAP-MHAW .
Related papers
- DiTS: Multimodal Diffusion Transformers Are Time Series Forecasters [50.43534351968113]
Existing generative time series models do not address the multi-dimensional properties of time series data well.<n>Inspired by Multimodal Diffusion Transformers that integrate textual guidance into video generation, we propose Diffusion Transformers for Time Series (DiTS)
arXiv Detail & Related papers (2026-02-06T10:48:13Z) - Adaptive Fuzzy Time Series Forecasting via Partially Asymmetric Convolution and Sub-Sliding Window Fusion [0.0]
We propose a novel convolutional architecture with partially asymmetric design based on the time of sliding window.<n>The proposed method achieves state-of-the-art results on most of popular time series datasets.
arXiv Detail & Related papers (2025-07-28T08:58:25Z) - MS-DFTVNet:A Long-Term Time Series Prediction Method Based on Multi-Scale Deformable Convolution [12.652031472297416]
We develop MS-DFTVNet, a deformable convolutional framework for long-term forecasting.<n>Experiments demonstrate that MS-DFTVNet not only significantly outperforms strong baselines but also achieves an average improvement of about 7.5%.
arXiv Detail & Related papers (2025-06-08T10:33:39Z) - A Multi-Layer CNN-GRUSKIP model based on transformer for spatial TEMPORAL traffic flow prediction [0.06597195879147556]
Traffic flow prediction remains a cornerstone for intelligent transportation systems ITS.<n>The CNN-GRUSKIP model emerges as pioneering approach.<n>The model consistently outperformed established models such as ARIMA, Graph Wave Net, HA, LSTM, STGCN, and APT.<n>With its potent predictive prowess and adaptive architecture, the CNN-GRUSKIP model stands to redefine ITS applications.
arXiv Detail & Related papers (2025-01-09T21:30:02Z) - A Self-organizing Interval Type-2 Fuzzy Neural Network for Multi-Step Time Series Prediction [9.546043411729206]
Interval type 2 fuzzy neural network (IT2FNN) has shown exceptional performance in uncertainty modelling for single-step prediction tasks.<n>This paper proposes a new selforganizing interval type-2 fuzzy neural network with multiple outputs (SOIT2FNN-MO)<n> Experimental results on chaotic and microgrid prediction problems demonstrate that SOIT2FNN-MO outperforms state-of-the-art methods.
arXiv Detail & Related papers (2024-07-10T19:35:44Z) - TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture.
To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer.
In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z) - WinNet: Make Only One Convolutional Layer Effective for Time Series Forecasting [11.232780368635416]
We present a highly accurate and simply structured CNN-based model with only one convolutional layer, called WinNet.
Results demonstrate that WinNet can achieve SOTA performance and lower complexity over CNN-based methods.
arXiv Detail & Related papers (2023-11-01T01:23:59Z) - CARD: Channel Aligned Robust Blend Transformer for Time Series
Forecasting [50.23240107430597]
We design a special Transformer, i.e., Channel Aligned Robust Blend Transformer (CARD for short), that addresses key shortcomings of CI type Transformer in time series forecasting.
First, CARD introduces a channel-aligned attention structure that allows it to capture both temporal correlations among signals.
Second, in order to efficiently utilize the multi-scale knowledge, we design a token blend module to generate tokens with different resolutions.
Third, we introduce a robust loss function for time series forecasting to alleviate the potential overfitting issue.
arXiv Detail & Related papers (2023-05-20T05:16:31Z) - FormerTime: Hierarchical Multi-Scale Representations for Multivariate
Time Series Classification [53.55504611255664]
FormerTime is a hierarchical representation model for improving the classification capacity for the multivariate time series classification task.
It exhibits three aspects of merits: (1) learning hierarchical multi-scale representations from time series data, (2) inheriting the strength of both transformers and convolutional networks, and (3) tacking the efficiency challenges incurred by the self-attention mechanism.
arXiv Detail & Related papers (2023-02-20T07:46:14Z) - Towards Long-Term Time-Series Forecasting: Feature, Pattern, and
Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning.
Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism.
We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z) - Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence.
This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time.
Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.