Related papers: Conv-like Scale-Fusion Time Series Transformer: A Multi-Scale Representation for Variable-Length Long Time Series

Conv-like Scale-Fusion Time Series Transformer: A Multi-Scale Representation for Variable-Length Long Time Series

URL: http://arxiv.org/abs/2509.17845v1
Date: Mon, 22 Sep 2025 14:37:59 GMT
Title: Conv-like Scale-Fusion Time Series Transformer: A Multi-Scale Representation for Variable-Length Long Time Series
Authors: Kai Zhang, Siming Sun, Zhengyu Fan, Qinmin Yang, Xuejun Jiang,
Abstract summary: Transformer-based models have advanced time series tasks, but struggle with feature redundancy and limited generalization capabilities.<n>We propose a Multi-Scale Representation Learning Framework based on a Conv-like ScaleFusion Transformer.<n>Our framework achieves superior feature independence, reduced redundancy, and better performance in forecasting and classification tasks compared to state-of-the-art methods.
Score: 10.93942806756288
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Time series analysis faces significant challenges in handling variable-length data and achieving robust generalization. While Transformer-based models have advanced time series tasks, they often struggle with feature redundancy and limited generalization capabilities. Drawing inspiration from classical CNN architectures' pyramidal structure, we propose a Multi-Scale Representation Learning Framework based on a Conv-like ScaleFusion Transformer. Our approach introduces a temporal convolution-like structure that combines patching operations with multi-head attention, enabling progressive temporal dimension compression and feature channel expansion. We further develop a novel cross-scale attention mechanism for effective feature fusion across different temporal scales, along with a log-space normalization method for variable-length sequences. Extensive experiments demonstrate that our framework achieves superior feature independence, reduced redundancy, and better performance in forecasting and classification tasks compared to state-of-the-art methods.

Related papers

DiTS: Multimodal Diffusion Transformers Are Time Series Forecasters [50.43534351968113]
Existing generative time series models do not address the multi-dimensional properties of time series data well.<n>Inspired by Multimodal Diffusion Transformers that integrate textual guidance into video generation, we propose Diffusion Transformers for Time Series (DiTS)
arXiv Detail & Related papers (2026-02-06T10:48:13Z)
Sentinel: Multi-Patch Transformer with Temporal and Channel Attention for Time Series Forecasting [48.52101281458809]
Transformer-based time series forecasting has recently gained strong interest due to the ability of transformers to model sequential data.<n>We propose Sentinel, a transformer-based architecture composed of an encoder able to extract contextual information from the channel dimension.<n>We introduce a multi-patch attention mechanism, which leverages the patching process to structure the input sequence in a way that can be naturally integrated into the transformer architecture.
arXiv Detail & Related papers (2025-03-22T06:01:50Z)
MFRS: A Multi-Frequency Reference Series Approach to Scalable and Accurate Time-Series Forecasting [51.94256702463408]
Time series predictability is derived from periodic characteristics at different frequencies.<n>We propose a novel time series forecasting method based on multi-frequency reference series correlation analysis.<n> Experiments on major open and synthetic datasets show state-of-the-art performance.
arXiv Detail & Related papers (2025-03-11T11:40:14Z)
Timer-XL: Long-Context Transformers for Unified Time Series Forecasting [67.83502953961505]
We present Timer-XL, a causal Transformer for unified time series forecasting.<n>Based on large-scale pre-training, Timer-XL achieves state-of-the-art zero-shot performance.
arXiv Detail & Related papers (2024-10-07T07:27:39Z)
PRformer: Pyramidal Recurrent Transformer for Multivariate Time Series Forecasting [82.03373838627606]
Self-attention mechanism in Transformer architecture requires positional embeddings to encode temporal order in time series prediction. We argue that this reliance on positional embeddings restricts the Transformer's ability to effectively represent temporal sequences. We present a model integrating PRE with a standard Transformer encoder, demonstrating state-of-the-art performance on various real-world datasets.
arXiv Detail & Related papers (2024-08-20T01:56:07Z)
Rough Transformers: Lightweight and Continuous Time Series Modelling through Signature Patching [46.58170057001437]
We introduce the Rough Transformer, a variation of the Transformer model that operates on continuous-time representations of input sequences.<n>We find that, on a variety of time-series-related tasks, Rough Transformers consistently outperform their vanilla attention counterparts.
arXiv Detail & Related papers (2024-05-31T14:00:44Z)
FormerTime: Hierarchical Multi-Scale Representations for Multivariate Time Series Classification [53.55504611255664]
FormerTime is a hierarchical representation model for improving the classification capacity for the multivariate time series classification task. It exhibits three aspects of merits: (1) learning hierarchical multi-scale representations from time series data, (2) inheriting the strength of both transformers and convolutional networks, and (3) tacking the efficiency challenges incurred by the self-attention mechanism.
arXiv Detail & Related papers (2023-02-20T07:46:14Z)
Temporal Tensor Transformation Network for Multivariate Time Series Prediction [1.2354076490479515]
We present a novel deep learning architecture, known as Temporal Transformation Network, which transforms the original time series into a higher order. This yields a new representation of the original multivariate time series, which enables the convolution kernel to extract complex and non-linear features as well as variable interactional signals from a relatively large temporal region. Experimental results show that Temporal Transformation Network outperforms several state-of-the-art methods on window-based predictions across various tasks.
arXiv Detail & Related papers (2020-01-04T07:28:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.