Related papers: Transformer Modeling for Both Scalability and Performance in Multivariate Time Series

Transformer Modeling for Both Scalability and Performance in Multivariate Time Series

URL: http://arxiv.org/abs/2509.19471v1
Date: Tue, 23 Sep 2025 18:28:24 GMT
Title: Transformer Modeling for Both Scalability and Performance in Multivariate Time Series
Authors: Hunjae Lee, Corey Clark,
Abstract summary: We propose a transformer with Delegate Token Attention (DELTAformer) to constrain inter-variable modeling.<n>Our results show that DELTAformer scales linearly with variable-count while actually outperforming standard transformers.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Variable count is among the main scalability bottlenecks for transformer modeling in multivariate time series (MTS) data. On top of this, a growing consensus in the field points to indiscriminate inter-variable mixing as a potential source of noise-accumulation and performance degradation. This is likely exacerbated by sparsity of informative signals characteristic of many MTS systems coupled with representational misalignment stemming from indiscriminate information mixing between (heterogeneous) variables. While scalability and performance are often seen as competing interests in transformer design, we show that both can be improved simultaneously in MTS by strategically constraining the representational capacity of inter-variable mixing. Our proposed method, transformer with Delegate Token Attention (DELTAformer), constrains inter-variable modeling through what we call delegate tokens which are then used to perform full, unconstrained, inter-temporal modeling. Delegate tokens act as an implicit regularizer that forces the model to be highly selective about what inter-variable information is allowed to propagate through the network. Our results show that DELTAformer scales linearly with variable-count while actually outperforming standard transformers, achieving state-of-the-art performance across benchmarks and baselines. In addition, DELTAformer can focus on relevant signals better than standard transformers in noisy MTS environments and overall exhibit superior noise-resilience. Overall, results across various experiments confirm that by aligning our model design to leverage domain-specific challenges in MTS to our advantage, DELTAformer can simultaneously achieve linear scaling while actually improving its performance against standard, quadratic transformers.

Related papers

DiTS: Multimodal Diffusion Transformers Are Time Series Forecasters [50.43534351968113]
Existing generative time series models do not address the multi-dimensional properties of time series data well.<n>Inspired by Multimodal Diffusion Transformers that integrate textual guidance into video generation, we propose Diffusion Transformers for Time Series (DiTS)
arXiv Detail & Related papers (2026-02-06T10:48:13Z)
FAIM: Frequency-Aware Interactive Mamba for Time Series Classification [87.84511960413715]
Time series classification (TSC) is crucial in numerous real-world applications, such as environmental monitoring, medical diagnosis, and posture recognition.<n>We propose FAIM, a lightweight Frequency-Aware Interactive Mamba model.<n>We show that FAIM consistently outperforms existing state-of-the-art (SOTA) methods, achieving a superior trade-off between accuracy and efficiency.
arXiv Detail & Related papers (2025-11-26T08:36:33Z)
HTMformer: Hybrid Time and Multivariate Transformer for Time Series Forecasting [14.205408555688306]
Transformer-based methods have achieved impressive results in time series forecasting.<n>Existing Transformers still exhibit limitations in sequence modeling as they tend to overemphasize temporal dependencies.<n>We find that the performance of Transformers is highly dependent on the embedding method used to learn effective representations.
arXiv Detail & Related papers (2025-10-08T14:40:42Z)
MTNet: Learning modality-aware representation with transformer for RGBT tracking [35.96855931247585]
We propose a modality-aware tracker based on transformer, termed MTNet.<n>A transformer fusion network is then applied to capture global dependencies to reinforce instance representations.<n>The proposed method achieves satisfactory results compared with the state-of-the-art competitors on three RGBT benchmarks.
arXiv Detail & Related papers (2025-08-24T10:01:11Z)
A temporal scale transformer framework for precise remaining useful life prediction in fuel cells [10.899223392837936]
Temporal Scale Transformer (TSTransformer) is an enhanced version of the inverted Transformer (iTransformer)<n>Unlike traditional Transformers that treat each timestep as an input token, TSTransformer maps sequences of varying lengths into tokens at different stages for inter-sequence modeling.<n>It improves local feature extraction, captures temporal scale characteristics, and reduces token count and computational costs.
arXiv Detail & Related papers (2025-04-08T23:42:54Z)
Learnable Multi-Scale Wavelet Transformer: A Novel Alternative to Self-Attention [0.0]
Learnable Multi-Scale Wavelet Transformer (LMWT) is a novel architecture that replaces the standard dot-product self-attention.<n>We present the detailed mathematical formulation of the learnable Haar wavelet module and its integration into the transformer framework.<n>Our results indicate that the LMWT achieves competitive performance while offering substantial computational advantages.
arXiv Detail & Related papers (2025-04-08T22:16:54Z)
SEA: State-Exchange Attention for High-Fidelity Physics Based Transformers [1.1650821883155187]
Current approaches using sequential networks have shown promise in estimating field variables for dynamical systems. The unresolved issue of rollout error accumulation results in unreliable estimations as the network predicts further into the future. Here, we introduce the State-Exchange Attention (SEA) module, a novel transformer-based module enabling information exchange between encoded fields.
arXiv Detail & Related papers (2024-10-20T20:25:01Z)
Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators [83.48423407316713]
We present a novel diffusion transformer framework incorporating an additional set of mediator tokens to engage with queries and keys separately. Our model initiates the denoising process with a precise, non-ambiguous stage and gradually transitions to a phase enriched with detail. Our method achieves a state-of-the-art FID score of 2.01 when integrated with the recent work SiT.
arXiv Detail & Related papers (2024-08-11T07:01:39Z)
UniTST: Effectively Modeling Inter-Series and Intra-Series Dependencies for Multivariate Time Series Forecasting [98.12558945781693]
We propose a transformer-based model UniTST containing a unified attention mechanism on the flattened patch tokens. Although our proposed model employs a simple architecture, it offers compelling performance as shown in our experiments on several datasets for time series forecasting.
arXiv Detail & Related papers (2024-06-07T14:39:28Z)
Considering Nonstationary within Multivariate Time Series with Variational Hierarchical Transformer for Forecasting [12.793705636683402]
We develop a powerful hierarchical probabilistic generative module to consider the non-stationarity and intrinsic characteristics within MTS. We then combine it with transformer for a well-defined variational generative dynamic model named Hierarchical Time series Variational Transformer (HTV-Trans) Being a powerful probabilistic model, HTV-Trans is utilized to learn expressive representations of MTS and applied to forecasting tasks.
arXiv Detail & Related papers (2024-03-08T16:04:36Z)
Isomer: Isomerous Transformer for Zero-shot Video Object Segmentation [59.91357714415056]
We propose two Transformer variants: Context-Sharing Transformer (CST) and Semantic Gathering-Scattering Transformer (S GST) CST learns the global-shared contextual information within image frames with a lightweight computation; S GST models the semantic correlation separately for the foreground and background. Compared with the baseline that uses vanilla Transformers for multi-stage fusion, ours significantly increase the speed by 13 times and achieves new state-of-the-art ZVOS performance.
arXiv Detail & Related papers (2023-08-13T06:12:00Z)
Towards Long-Term Time-Series Forecasting: Feature, Pattern, and Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning. Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism. We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z)
ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias [76.16156833138038]
We propose a novel Vision Transformer Advanced by Exploring intrinsic IB from convolutions, ie, ViTAE. ViTAE has several spatial pyramid reduction modules to downsample and embed the input image into tokens with rich multi-scale context. In each transformer layer, ViTAE has a convolution block in parallel to the multi-head self-attention module, whose features are fused and fed into the feed-forward network.
arXiv Detail & Related papers (2021-06-07T05:31:06Z)
Bayesian Transformer Language Models for Speech Recognition [59.235405107295655]
State-of-the-art neural language models (LMs) represented by Transformers are highly complex. This paper proposes a full Bayesian learning framework for Transformer LM estimation.
arXiv Detail & Related papers (2021-02-09T10:55:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.