Related papers: CNN-TFT explained by SHAP with multi-head attention weights for time series forecasting

CNN-TFT explained by SHAP with multi-head attention weights for time series forecasting

URL: http://arxiv.org/abs/2510.06840v1
Date: Wed, 08 Oct 2025 10:08:28 GMT
Title: CNN-TFT explained by SHAP with multi-head attention weights for time series forecasting
Authors: Stefano F. Stefenon, João P. Matos-Carvalho, Valderi R. Q. Leithardt, Kin-Choong Yow,
Abstract summary: This paper proposes a hybrid architecture integrating convolutional feature extraction with a temporal fusion transformer.<n>CNN-TFT outperforms well-established deep learning models, with a mean absolute percentage error of up to 2.2%.
Score: 0.6019777076722422
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Convolutional neural networks (CNNs) and transformer architectures offer strengths for modeling temporal data: CNNs excel at capturing local patterns and translational invariances, while transformers effectively model long-range dependencies via self-attention. This paper proposes a hybrid architecture integrating convolutional feature extraction with a temporal fusion transformer (TFT) backbone to enhance multivariate time series forecasting. The CNN module first applies a hierarchy of one-dimensional convolutional layers to distill salient local patterns from raw input sequences, reducing noise and dimensionality. The resulting feature maps are then fed into the TFT, which applies multi-head attention to capture both short- and long-term dependencies and to weigh relevant covariates adaptively. We evaluate the CNN-TFT on a hydroelectric natural flow time series dataset. Experimental results demonstrate that CNN-TFT outperforms well-established deep learning models, with a mean absolute percentage error of up to 2.2%. The explainability of the model is obtained by a proposed Shapley additive explanations with multi-head attention weights (SHAP-MHAW). Our novel architecture, named CNN-TFT-SHAP-MHAW, is promising for applications requiring high-fidelity, multivariate time series forecasts, being available for future analysis at https://github.com/SFStefenon/CNN-TFT-SHAP-MHAW .

Related papers

DiTS: Multimodal Diffusion Transformers Are Time Series Forecasters [50.43534351968113]
Existing generative time series models do not address the multi-dimensional properties of time series data well.<n>Inspired by Multimodal Diffusion Transformers that integrate textual guidance into video generation, we propose Diffusion Transformers for Time Series (DiTS)
arXiv Detail & Related papers (2026-02-06T10:48:13Z)
Adaptive Fuzzy Time Series Forecasting via Partially Asymmetric Convolution and Sub-Sliding Window Fusion [0.0]
We propose a novel convolutional architecture with partially asymmetric design based on the time of sliding window.<n>The proposed method achieves state-of-the-art results on most of popular time series datasets.
arXiv Detail & Related papers (2025-07-28T08:58:25Z)
MS-DFTVNet:A Long-Term Time Series Prediction Method Based on Multi-Scale Deformable Convolution [12.652031472297416]
We develop MS-DFTVNet, a deformable convolutional framework for long-term forecasting.<n>Experiments demonstrate that MS-DFTVNet not only significantly outperforms strong baselines but also achieves an average improvement of about 7.5%.
arXiv Detail & Related papers (2025-06-08T10:33:39Z)
A Multi-Layer CNN-GRUSKIP model based on transformer for spatial TEMPORAL traffic flow prediction [0.06597195879147556]
Traffic flow prediction remains a cornerstone for intelligent transportation systems ITS.<n>The CNN-GRUSKIP model emerges as pioneering approach.<n>The model consistently outperformed established models such as ARIMA, Graph Wave Net, HA, LSTM, STGCN, and APT.<n>With its potent predictive prowess and adaptive architecture, the CNN-GRUSKIP model stands to redefine ITS applications.
arXiv Detail & Related papers (2025-01-09T21:30:02Z)
A Self-organizing Interval Type-2 Fuzzy Neural Network for Multi-Step Time Series Prediction [9.546043411729206]
Interval type 2 fuzzy neural network (IT2FNN) has shown exceptional performance in uncertainty modelling for single-step prediction tasks.<n>This paper proposes a new selforganizing interval type-2 fuzzy neural network with multiple outputs (SOIT2FNN-MO)<n> Experimental results on chaotic and microgrid prediction problems demonstrate that SOIT2FNN-MO outperforms state-of-the-art methods.
arXiv Detail & Related papers (2024-07-10T19:35:44Z)
TCCT-Net: Two-Stream Network Architecture for Fast and Efficient Engagement Estimation via Behavioral Feature Signals [58.865901821451295]
We present a novel two-stream feature fusion "Tensor-Convolution and Convolution-Transformer Network" (TCCT-Net) architecture. To better learn the meaningful patterns in the temporal-spatial domain, we design a "CT" stream that integrates a hybrid convolutional-transformer. In parallel, to efficiently extract rich patterns from the temporal-frequency domain, we introduce a "TC" stream that uses Continuous Wavelet Transform (CWT) to represent information in a 2D tensor form.
arXiv Detail & Related papers (2024-04-15T06:01:48Z)
WinNet: Make Only One Convolutional Layer Effective for Time Series Forecasting [11.232780368635416]
We present a highly accurate and simply structured CNN-based model with only one convolutional layer, called WinNet. Results demonstrate that WinNet can achieve SOTA performance and lower complexity over CNN-based methods.
arXiv Detail & Related papers (2023-11-01T01:23:59Z)
CARD: Channel Aligned Robust Blend Transformer for Time Series Forecasting [50.23240107430597]
We design a special Transformer, i.e., Channel Aligned Robust Blend Transformer (CARD for short), that addresses key shortcomings of CI type Transformer in time series forecasting. First, CARD introduces a channel-aligned attention structure that allows it to capture both temporal correlations among signals. Second, in order to efficiently utilize the multi-scale knowledge, we design a token blend module to generate tokens with different resolutions. Third, we introduce a robust loss function for time series forecasting to alleviate the potential overfitting issue.
arXiv Detail & Related papers (2023-05-20T05:16:31Z)
FormerTime: Hierarchical Multi-Scale Representations for Multivariate Time Series Classification [53.55504611255664]
FormerTime is a hierarchical representation model for improving the classification capacity for the multivariate time series classification task. It exhibits three aspects of merits: (1) learning hierarchical multi-scale representations from time series data, (2) inheriting the strength of both transformers and convolutional networks, and (3) tacking the efficiency challenges incurred by the self-attention mechanism.
arXiv Detail & Related papers (2023-02-20T07:46:14Z)
Towards Long-Term Time-Series Forecasting: Feature, Pattern, and Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning. Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism. We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z)
Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence. This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time. Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.