Related papers: A Unified Hyperparameter Optimization Pipeline for Transformer-Based Time Series Forecasting Models

A Unified Hyperparameter Optimization Pipeline for Transformer-Based Time Series Forecasting Models

URL: http://arxiv.org/abs/2501.01394v1
Date: Thu, 02 Jan 2025 18:12:42 GMT
Title: A Unified Hyperparameter Optimization Pipeline for Transformer-Based Time Series Forecasting Models
Authors: Jingjing Xu, Caesar Wu, Yuan-Fang Li, Grégoire Danoy, Pascal Bouvry,
Abstract summary: Transformer-based models for time series forecasting (TSF) have attracted significant attention in recent years due to their effectiveness and versatility.<n>We present one such pipeline and conduct extensive experiments on several state-of-the-art (SOTA) transformer-based TSF models.<n>Our pipeline is generalizable beyond transformer-based architectures and can be applied to other SOTA models, such as Mamba and TimeMixer.
Score: 36.31269406067809
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transformer-based models for time series forecasting (TSF) have attracted significant attention in recent years due to their effectiveness and versatility. However, these models often require extensive hyperparameter optimization (HPO) to achieve the best possible performance, and a unified pipeline for HPO in transformer-based TSF remains lacking. In this paper, we present one such pipeline and conduct extensive experiments on several state-of-the-art (SOTA) transformer-based TSF models. These experiments are conducted on standard benchmark datasets to evaluate and compare the performance of different models, generating practical insights and examples. Our pipeline is generalizable beyond transformer-based architectures and can be applied to other SOTA models, such as Mamba and TimeMixer, as demonstrated in our experiments. The goal of this work is to provide valuable guidance to both industry practitioners and academic researchers in efficiently identifying optimal hyperparameters suited to their specific domain applications. The code and complete experimental results are available on GitHub.

Related papers

Visual Fourier Prompt Tuning [63.66866445034855]
We propose the Visual Fourier Prompt Tuning (VFPT) method as a general and effective solution for adapting large-scale transformer-based models. Our approach incorporates the Fast Fourier Transform into prompt embeddings and harmoniously considers both spatial and frequency domain information. Our results demonstrate that our approach outperforms current state-of-the-art baselines on two benchmarks.
arXiv Detail & Related papers (2024-11-02T18:18:35Z)
Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey and Benchmark [97.8968058408759]
Pre-trained vision models (PVMs) have demonstrated remarkable adaptability across a wide range of downstream vision tasks.<n>As these models scale to billions or even trillions of parameters, conventional full fine-tuning has become increasingly impractical due to its high computational and storage demands.<n> parameter-efficient fine-tuning (PEFT) has emerged as a promising alternative, aiming to achieve performance comparable to full fine-tuning while making minimal adjustments to the model parameters.
arXiv Detail & Related papers (2024-02-03T19:12:20Z)
Generative Parameter-Efficient Fine-Tuning [8.481707805559589]
GIFT learns to generate the fine-tuned weights for a layer directly from its pretrained weights. We show this formulation bridges parameter-efficient fine-tuning and representation fine-tuning.
arXiv Detail & Related papers (2023-12-01T16:33:57Z)
A Systematic Review for Transformer-based Long-term Series Forecasting [7.414422194379818]
Transformer architecture has proven to be the most successful solution to extract semantic correlations. Various variants have enabled transformer architecture to handle long-term time series forecasting tasks.
arXiv Detail & Related papers (2023-10-31T06:37:51Z)
A Transformer-based Framework For Multi-variate Time Series: A Remaining Useful Life Prediction Use Case [4.0466311968093365]
This work proposed an encoder-transformer architecture-based framework for time series prediction. We validated the effectiveness of the proposed framework on all four sets of the C-MAPPS benchmark dataset. To enable the model awareness of the initial stages of the machine life and its degradation path, a novel expanding window method was proposed.
arXiv Detail & Related papers (2023-08-19T02:30:35Z)
PETformer: Long-term Time Series Forecasting via Placeholder-enhanced Transformer [5.095882718779794]
This study investigates key issues when applying Transformer to long-term time series forecasting tasks. We introduce the Placeholder-enhanced Technique (PET) to enhance the computational efficiency and predictive accuracy of Transformer in LTSF tasks. PETformer achieves state-of-the-art performance on eight commonly used public datasets for LTSF, surpassing all existing models.
arXiv Detail & Related papers (2023-08-09T08:30:22Z)
Emergent Agentic Transformer from Chain of Hindsight Experience [96.56164427726203]
We show that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches. This is the first time that a simple transformer-based model performs competitively with both temporal-difference and imitation-learning-based approaches.
arXiv Detail & Related papers (2023-05-26T00:43:02Z)
Full Stack Optimization of Transformer Inference: a Survey [58.55475772110702]
Transformer models achieve superior accuracy across a wide range of applications. The amount of compute and bandwidth required for inference of recent Transformer models is growing at a significant rate. There has been an increased focus on making Transformer models more efficient.
arXiv Detail & Related papers (2023-02-27T18:18:13Z)
Transformers For Recognition In Overhead Imagery: A Reality Check [0.0]
We compare the impact of adding transformer structures into state-of-the-art segmentation models for overhead imagery. Our results suggest that transformers provide consistent, but modest, performance improvements.
arXiv Detail & Related papers (2022-10-23T02:17:31Z)
CLMFormer: Mitigating Data Redundancy to Revitalize Transformer-based Long-Term Time Series Forecasting System [46.39662315849883]
Long-term time-series forecasting (LTSF) plays a crucial role in various practical applications. Existing Transformer-based models, such as Fedformer and Informer, often achieve their best performances on validation sets after just a few epochs. We propose a novel approach to address this issue by employing curriculum learning and introducing a memory-driven decoder.
arXiv Detail & Related papers (2022-07-16T04:05:15Z)
Towards Learning Universal Hyperparameter Optimizers with Transformers [57.35920571605559]
We introduce the OptFormer, the first text-based Transformer HPO framework that provides a universal end-to-end interface for jointly learning policy and function prediction. Our experiments demonstrate that the OptFormer can imitate at least 7 different HPO algorithms, which can be further improved via its function uncertainty estimates.
arXiv Detail & Related papers (2022-05-26T12:51:32Z)
Development of Deep Transformer-Based Models for Long-Term Prediction of Transient Production of Oil Wells [9.832272256738452]
We propose a novel approach to data-driven modeling of a transient production of oil wells. We apply the transformer-based neural networks trained on the multivariate time series composed of various parameters of oil wells. We generalize the single-well model based on the transformer architecture for multiple wells to simulate complex transient oilfield-level patterns.
arXiv Detail & Related papers (2021-10-12T15:00:45Z)
Taming Sparsely Activated Transformer with Stochastic Experts [76.0711573018493]
Sparsely activated models (SAMs) can easily scale to have outrageously large amounts of parameters without significant increase in computational cost. In this paper, we propose a new expert-based model, THOR (Transformer witH StOchastic ExpeRts) Unlike classic expert-based models, such as the Switch Transformer, experts in THOR are randomly activated for each input during training and inference.
arXiv Detail & Related papers (2021-10-08T17:15:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.