Revisiting the Generic Transformer: Deconstructing a Strong Baseline for Time Series Foundation Models
- URL: http://arxiv.org/abs/2602.06909v1
- Date: Fri, 06 Feb 2026 18:01:44 GMT
- Title: Revisiting the Generic Transformer: Deconstructing a Strong Baseline for Time Series Foundation Models
- Authors: Yunshi Wen, Wesley M. Gifford, Chandra Reddy, Lam M. Nguyen, Jayant Kalagnanam, Anak Agung Julius,
- Abstract summary: We investigate the potential of a standard patch Transformer, demonstrating that it achieves state-of-the-art zero-shot forecasting performance.<n>We conduct a comprehensive ablation study that covers model scaling, data composition, and training techniques to isolate the essential ingredients for high performance.
- Score: 18.841505010078112
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The recent surge in Time Series Foundation Models has rapidly advanced the field, yet the heterogeneous training setups across studies make it difficult to attribute improvements to architectural innovations versus data engineering. In this work, we investigate the potential of a standard patch Transformer, demonstrating that this generic architecture achieves state-of-the-art zero-shot forecasting performance using a straightforward training protocol. We conduct a comprehensive ablation study that covers model scaling, data composition, and training techniques to isolate the essential ingredients for high performance. Our findings identify the key drivers of performance, while confirming that the generic architecture itself demonstrates excellent scalability. By strictly controlling these variables, we provide comprehensive empirical results on model scaling across multiple dimensions. We release our open-source model and detailed findings to establish a transparent, reproducible baseline for future research.
Related papers
- Simplicity Prevails: The Emergence of Generalizable AIGI Detection in Visual Foundation Models [15.709482146201283]
A simple linear classifier trained on the frozen features of modern Vision Foundation Models establishes a new state-of-the-art.<n>We show that this baseline matches specialized detectors on standard benchmarks but also decisively outperforms them on in-the-wild datasets.<n>We conclude by advocating for a paradigm shift in AI forensics, moving from overfitting on static benchmarks to harnessing the evolving world knowledge of foundation models for real-world reliability.
arXiv Detail & Related papers (2026-02-02T07:20:02Z) - Demystifying Data-Driven Probabilistic Medium-Range Weather Forecasting [63.8116386935854]
We demonstrate that state-of-the-art probabilistic skill requires neither intricate architectural constraints nor specialized trainings.<n>We introduce a scalable framework for learning multi-scale atmospheric dynamics by combining a directly downsampled latent space with a history-conditioned local projector.<n>We find that our framework design is robust to the choice of probabilistic estimators, seamlessly supporting interpolants, diffusion models, and CRPS-based ensemble training.
arXiv Detail & Related papers (2026-01-26T03:52:16Z) - Scaling Transformer-Based Novel View Synthesis Models with Token Disentanglement and Synthetic Data [53.040873127309766]
We propose a token disentanglement process within the transformer architecture, enhancing feature separation and ensuring more effective learning.<n>Our method outperforms existing models on both in-dataset and cross-dataset evaluations.
arXiv Detail & Related papers (2025-09-08T17:58:06Z) - The Power of Architecture: Deep Dive into Transformer Architectures for Long-Term Time Series Forecasting [26.76928230531243]
Transformer-based models have recently become dominant in Long-term Time Series Forecasting (LTSF)<n> variations in their architecture, such as encoder-only, encoder-decoder, and decoder-only designs, raise a crucial question: What Transformer architecture works best for LTSF tasks?<n>Existing models are often tightly coupled with various time-series-specific designs, making it difficult to isolate the impact of the architecture itself.<n>We propose a novel taxonomy that disentangles these designs, enabling clearer and more unified comparisons of Transformer architectures.
arXiv Detail & Related papers (2025-07-17T12:16:04Z) - Triple Attention Transformer Architecture for Time-Dependent Concrete Creep Prediction [0.0]
This paper presents a novel Triple Attention Transformer Architecture for predicting time-dependent concrete creep.<n>By transforming concrete creep prediction into an autoregressive sequence modeling task similar to language processing, our architecture leverages the transformer's self-attention mechanisms.<n>The architecture achieves exceptional performance with mean absolute percentage error of 1.63% and R2 values of 0.999 across all datasets.
arXiv Detail & Related papers (2025-05-28T22:30:35Z) - Exploring the design space of deep-learning-based weather forecasting systems [56.129148006412855]
This paper systematically analyzes the impact of different design choices on deep-learning-based weather forecasting systems.
We study fixed-grid architectures such as UNet, fully convolutional architectures, and transformer-based models.
We propose a hybrid system that combines the strong performance of fixed-grid models with the flexibility of grid-invariant architectures.
arXiv Detail & Related papers (2024-10-09T22:25:50Z) - Fantastic Gains and Where to Find Them: On the Existence and Prospect of
General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other.
We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z) - TEMPO: Prompt-based Generative Pre-trained Transformer for Time Series Forecasting [24.834846119163885]
We propose a novel framework, TEMPO, that can effectively learn time series representations.
TEMPO expands the capability for dynamically modeling real-world temporal phenomena from data within diverse domains.
arXiv Detail & Related papers (2023-10-08T00:02:25Z) - Enhanced LFTSformer: A Novel Long-Term Financial Time Series Prediction Model Using Advanced Feature Engineering and the DS Encoder Informer Architecture [0.8532753451809455]
This study presents a groundbreaking model for forecasting long-term financial time series, termed the Enhanced LFTSformer.
The model distinguishes itself through several significant innovations.
Systematic experimentation on a range of benchmark stock market datasets demonstrates that the Enhanced LFTSformer outperforms traditional machine learning models.
arXiv Detail & Related papers (2023-10-03T08:37:21Z) - Universal Domain Adaptation from Foundation Models: A Baseline Study [58.51162198585434]
We make empirical studies of state-of-the-art UniDA methods using foundation models.
We introduce textitCLIP distillation, a parameter-free method specifically designed to distill target knowledge from CLIP models.
Although simple, our method outperforms previous approaches in most benchmark tasks.
arXiv Detail & Related papers (2023-05-18T16:28:29Z) - Rethinking Architecture Design for Tackling Data Heterogeneity in
Federated Learning [53.73083199055093]
We show that attention-based architectures (e.g., Transformers) are fairly robust to distribution shifts.
Our experiments show that replacing convolutional networks with Transformers can greatly reduce catastrophic forgetting of previous devices.
arXiv Detail & Related papers (2021-06-10T21:04:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.