Related papers: UrbanAI 2025 Challenge: Linear vs Transformer Models for Long-Horizon Exogenous Temperature Forecasting

UrbanAI 2025 Challenge: Linear vs Transformer Models for Long-Horizon Exogenous Temperature Forecasting

URL: http://arxiv.org/abs/2512.10866v1
Date: Thu, 11 Dec 2025 17:59:44 GMT
Title: UrbanAI 2025 Challenge: Linear vs Transformer Models for Long-Horizon Exogenous Temperature Forecasting
Authors: Ruslan Gokhman,
Abstract summary: We study long-horizon-only temperature forecasting using linear and Transformer-family models.<n>Results show that linear baselines consistently outperform more complex Transformer-family architectures.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study long-horizon exogenous-only temperature forecasting - a challenging univariate setting where only the past values of the indoor temperature are used for prediction - using linear and Transformer-family models. We evaluate Linear, NLinear, DLinear, Transformer, Informer, and Autoformer under standardized train, validation, and test splits. Results show that linear baselines (Linear, NLinear, DLinear) consistently outperform more complex Transformer-family architectures, with DLinear achieving the best overall accuracy across all splits. These findings highlight that carefully designed linear models remain strong baselines for time series forecasting in challenging exogenous-only settings.

Related papers

Bridging Simplicity and Sophistication using GLinear: A Novel Architecture for Enhanced Time Series Prediction [1.52551943336894]
Time Series Forecasting (TSF) is an important application across many fields.<n>Recent research suggests simpler linear models might outperform or at least provide competitive performance compared to complex Transformer-based models for TSF tasks.
arXiv Detail & Related papers (2025-01-02T06:19:53Z)
Parallelizing Linear Transformers with the Delta Rule over Sequence Length [49.88826673324244]
This work describes a hardware-efficient algorithm for training linear transformers with the delta rule.<n>We train a 1.3B model for 100B tokens and find that it outperforms recent linear-time baselines.
arXiv Detail & Related papers (2024-06-10T17:24:42Z)
Your Transformer is Secretly Linear [7.935853865895353]
We analyze embedding transformations between sequential layers, uncovering a near-perfect linear relationship. We show that removing or linearly approximating some of the most linear blocks of transformers does not affect significantly the loss or model performance. In our pretraining experiments on smaller models we introduce a cosine-similarity-based regularization, aimed at reducing layer linearity.
arXiv Detail & Related papers (2024-05-19T22:44:00Z)
Linearizing Large Language Models [26.94551511277412]
We present a method to uptrain existing large pre-trained transformers into Recurrent Neural Networks (RNNs) with a modest compute budget. We find that our linearization technique leads to competitive performance on standard benchmarks, but we identify persistent in-context learning and long-context modeling shortfalls for even the largest linear models.
arXiv Detail & Related papers (2024-05-10T17:59:08Z)
Mixture-of-Linear-Experts for Long-term Time Series Forecasting [13.818468255379969]
We propose a Mixture-of-Experts-style augmentation for linear-centric models. Instead of training a single model, MoLE trains multiple linear-centric models and a router model that weighs and mixes their outputs.
arXiv Detail & Related papers (2023-12-11T19:05:02Z)
Test-Time Adaptation Induces Stronger Accuracy and Agreement-on-the-Line [65.14099135546594]
Recent test-time adaptation (TTA) methods drastically strengthen the ACL and AGL trends in models, even in shifts where models showed very weak correlations before. Our results show that by combining TTA with AGL-based estimation methods, we can estimate the OOD performance of models with high precision for a broader set of distribution shifts.
arXiv Detail & Related papers (2023-10-07T23:21:25Z)
Tangent Transformers for Composition, Privacy and Removal [58.280295030852194]
Tangent Attention Fine-Tuning (TAFT) is a method for fine-tuning linearized transformers. Tangent Attention Fine-Tuning (TAFT) is a method for fine-tuning linearized transformers.
arXiv Detail & Related papers (2023-07-16T18:31:25Z)
Online Evolutionary Neural Architecture Search for Multivariate Non-Stationary Time Series Forecasting [72.89994745876086]
This work presents the Online Neuro-Evolution-based Neural Architecture Search (ONE-NAS) algorithm. ONE-NAS is a novel neural architecture search method capable of automatically designing and dynamically training recurrent neural networks (RNNs) for online forecasting tasks. Results demonstrate that ONE-NAS outperforms traditional statistical time series forecasting methods.
arXiv Detail & Related papers (2023-02-20T22:25:47Z)
Are Transformers Effective for Time Series Forecasting? [13.268196448051308]
Recently, there has been a surge of Transformer-based solutions for the time series forecasting (TSF) task. This study investigates whether Transformer-based techniques are the right solutions for long-term time series forecasting. We find that the relatively higher long-term forecasting accuracy of Transformer-based solutions has little to do with the temporal relation extraction capabilities of the Transformer architecture.
arXiv Detail & Related papers (2022-05-26T17:17:08Z)
Churn Reduction via Distillation [54.5952282395487]
We show an equivalence between training with distillation using the base model as the teacher and training with an explicit constraint on the predictive churn. We then show that distillation performs strongly for low churn training against a number of recent baselines.
arXiv Detail & Related papers (2021-06-04T18:03:31Z)
LQF: Linear Quadratic Fine-Tuning [114.3840147070712]
We present the first method for linearizing a pre-trained model that achieves comparable performance to non-linear fine-tuning. LQF consists of simple modifications to the architecture, loss function and optimization typically used for classification.
arXiv Detail & Related papers (2020-12-21T06:40:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.