Related papers: Learning Novel Transformer Architecture for Time-series Forecasting

Learning Novel Transformer Architecture for Time-series Forecasting

URL: http://arxiv.org/abs/2502.13721v1
Date: Wed, 19 Feb 2025 13:49:20 GMT
Title: Learning Novel Transformer Architecture for Time-series Forecasting
Authors: Juyuan Zhang, Wei Zhu, Jiechao Gao,
Abstract summary: AutoFormer-TS is a novel framework that leverages a comprehensive search space for Transformer architectures tailored to time-series prediction tasks.<n>Our framework introduces a differentiable neural architecture search (DNAS) method, AB-DARTS, which improves upon existing DNAS approaches.
Score: 9.412920379798928
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite the success of Transformer-based models in the time-series prediction (TSP) tasks, the existing Transformer architecture still face limitations and the literature lacks comprehensive explorations into alternative architectures. To address these challenges, we propose AutoFormer-TS, a novel framework that leverages a comprehensive search space for Transformer architectures tailored to TSP tasks. Our framework introduces a differentiable neural architecture search (DNAS) method, AB-DARTS, which improves upon existing DNAS approaches by enhancing the identification of optimal operations within the architecture. AutoFormer-TS systematically explores alternative attention mechanisms, activation functions, and encoding operations, moving beyond the traditional Transformer design. Extensive experiments demonstrate that AutoFormer-TS consistently outperforms state-of-the-art baselines across various TSP benchmarks, achieving superior forecasting accuracy while maintaining reasonable training efficiency.

Related papers

DASViT: Differentiable Architecture Search for Vision Transformer [8.839801565444775]
We introduce Differentiable Architecture Search for Vision Transformer (DASViT)<n>DASViT bridges the gap in differentiable search for ViTs and uncovers novel designs.<n> Experiments show that DASViT delivers architectures that break traditional Transformer encoder designs, outperform ViT-B/16 on multiple datasets, and achieve superior efficiency with fewer parameters and FLOPs.
arXiv Detail & Related papers (2025-07-17T12:48:00Z)
The Power of Architecture: Deep Dive into Transformer Architectures for Long-Term Time Series Forecasting [26.76928230531243]
Transformer-based models have recently become dominant in Long-term Time Series Forecasting (LTSF)<n> variations in their architecture, such as encoder-only, encoder-decoder, and decoder-only designs, raise a crucial question: What Transformer architecture works best for LTSF tasks?<n>Existing models are often tightly coupled with various time-series-specific designs, making it difficult to isolate the impact of the architecture itself.<n>We propose a novel taxonomy that disentangles these designs, enabling clearer and more unified comparisons of Transformer architectures.
arXiv Detail & Related papers (2025-07-17T12:16:04Z)
ZeroLM: Data-Free Transformer Architecture Search for Language Models [54.83882149157548]
Current automated proxy discovery approaches suffer from extended search times, susceptibility to data overfitting, and structural complexity. This paper introduces a novel zero-cost proxy methodology that quantifies model capacity through efficient weight statistics. Our evaluation demonstrates the superiority of this approach, achieving a Spearman's rho of 0.76 and Kendall's tau of 0.53 on the FlexiBERT benchmark.
arXiv Detail & Related papers (2025-03-24T13:11:22Z)
TART: Token-based Architecture Transformer for Neural Network Performance Prediction [0.0]
Token-based Architecture Transformer (TART) predicts neural network performance without the need to train candidate networks.<n>TART attains state-of-the-art performance on the DeepNets-1M dataset for performance prediction tasks without edge information.
arXiv Detail & Related papers (2025-01-02T05:22:17Z)
Knowledge-enhanced Transformer for Multivariate Long Sequence Time-series Forecasting [4.645182684813973]
We introduce a novel approach that encapsulates conceptual relationships among variables within a well-defined knowledge graph. We investigate the influence of this integration into seminal architectures such as PatchTST, Autoformer, Informer, and Vanilla Transformer. This enhancement empowers transformer-based architectures to address the inherent structural relation between variables.
arXiv Detail & Related papers (2024-11-17T11:53:54Z)
TransNAS-TSAD: Harnessing Transformers for Multi-Objective Neural Architecture Search in Time Series Anomaly Detection [3.5681028373124066]
This paper introduces TransNAS-TSAD, a framework that synergizes the transformer architecture with neural architecture search (NAS) Our evaluation reveals that TransNAS-TSAD surpasses conventional anomaly detection models due to its tailored architectural adaptability.
arXiv Detail & Related papers (2023-11-29T20:13:32Z)
AutoST: Training-free Neural Architecture Search for Spiking Transformers [14.791412391584064]
Spiking Transformers achieve both the energy efficiency of Spiking Neural Networks (SNNs) and the high capacity of Transformers. Existing Spiking Transformer architectures exhibit a notable architectural gap, resulting in suboptimal performance. We introduce AutoST, a training-free NAS method for Spiking Transformers, to rapidly identify high-performance Spiking Transformer architectures.
arXiv Detail & Related papers (2023-07-01T10:19:52Z)
Full Stack Optimization of Transformer Inference: a Survey [58.55475772110702]
Transformer models achieve superior accuracy across a wide range of applications. The amount of compute and bandwidth required for inference of recent Transformer models is growing at a significant rate. There has been an increased focus on making Transformer models more efficient.
arXiv Detail & Related papers (2023-02-27T18:18:13Z)
Exploring Transformers for Behavioural Biometrics: A Case Study in Gait Recognition [0.7874708385247353]
This article intends to explore and propose novel gait biometric recognition systems based on Transformers. Several state-of-the-art architectures (Vanilla, Informer, Autoformer, Block-Recurrent Transformer, and THAT) are considered in the experimental framework. Experiments are carried out using the two popular public databases whuGAIT and OU-ISIR.
arXiv Detail & Related papers (2022-06-03T08:08:40Z)
Rich CNN-Transformer Feature Aggregation Networks for Super-Resolution [50.10987776141901]
Recent vision transformers along with self-attention have achieved promising results on various computer vision tasks. We introduce an effective hybrid architecture for super-resolution (SR) tasks, which leverages local features from CNNs and long-range dependencies captured by transformers. Our proposed method achieves state-of-the-art SR results on numerous benchmark datasets.
arXiv Detail & Related papers (2022-03-15T06:52:25Z)
AutoBERT-Zero: Evolving BERT Backbone from Scratch [94.89102524181986]
We propose an Operation-Priority Neural Architecture Search (OP-NAS) algorithm to automatically search for promising hybrid backbone architectures. We optimize both the search algorithm and evaluation of candidate models to boost the efficiency of our proposed OP-NAS. Experiments show that the searched architecture (named AutoBERT-Zero) significantly outperforms BERT and its variants of different model capacities in various downstream tasks.
arXiv Detail & Related papers (2021-07-15T16:46:01Z)
Dynamically Grown Generative Adversarial Networks [111.43128389995341]
We propose a method to dynamically grow a GAN during training, optimizing the network architecture and its parameters together with automation. The method embeds architecture search techniques as an interleaving step with gradient-based training to periodically seek the optimal architecture-growing strategy for the generator and discriminator.
arXiv Detail & Related papers (2021-06-16T01:25:51Z)
Towards Automated Neural Interaction Discovery for Click-Through Rate Prediction [64.03526633651218]
Click-Through Rate (CTR) prediction is one of the most important machine learning tasks in recommender systems. We propose an automated interaction architecture discovering framework for CTR prediction named AutoCTR.
arXiv Detail & Related papers (2020-06-29T04:33:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.