GIFT-Eval: A Benchmark For General Time Series Forecasting Model Evaluation
- URL: http://arxiv.org/abs/2410.10393v2
- Date: Mon, 11 Nov 2024 04:48:24 GMT
- Title: GIFT-Eval: A Benchmark For General Time Series Forecasting Model Evaluation
- Authors: Taha Aksu, Gerald Woo, Juncheng Liu, Xu Liu, Chenghao Liu, Silvio Savarese, Caiming Xiong, Doyen Sahoo,
- Abstract summary: Time series foundation models excel in zero-shot forecasting, handling diverse tasks without explicit training.
GIFT-Eval is a pioneering benchmark aimed at promoting evaluation across diverse datasets.
GIFT-Eval encompasses 23 datasets over 144,000 time series and 177 million data points.
- Score: 90.53485251837235
- License:
- Abstract: Time series foundation models excel in zero-shot forecasting, handling diverse tasks without explicit training. However, the advancement of these models has been hindered by the lack of comprehensive benchmarks. To address this gap, we introduce the General Time Series Forecasting Model Evaluation, GIFT-Eval, a pioneering benchmark aimed at promoting evaluation across diverse datasets. GIFT-Eval encompasses 23 datasets over 144,000 time series and 177 million data points, spanning seven domains, 10 frequencies, multivariate inputs, and prediction lengths ranging from short to long-term forecasts. To facilitate the effective pretraining and evaluation of foundation models, we also provide a non-leaking pretraining dataset containing approximately 230 billion data points. Additionally, we provide a comprehensive analysis of 17 baselines, which includes statistical models, deep learning models, and foundation models. We discuss each model in the context of various benchmark characteristics and offer a qualitative analysis that spans both deep learning and foundation models. We believe the insights from this analysis, along with access to this new standard zero-shot time series forecasting benchmark, will guide future developments in time series foundation models. Code, data, and the leaderboard can be found at https://github.com/SalesforceAIResearch/gift-eval .
Related papers
- In-Context Fine-Tuning for Time-Series Foundation Models [18.348874079298298]
In particular, we design a pretrained foundation model that can be prompted with multiple time-series examples.
Our foundation model is specifically trained to utilize examples from multiple related time-series in its context window.
We show that such a foundation model that uses in-context examples at inference time can obtain much better performance on popular forecasting benchmarks.
arXiv Detail & Related papers (2024-10-31T16:20:04Z) - FoundTS: Comprehensive and Unified Benchmarking of Foundation Models for Time Series Forecasting [44.33565276128137]
Time Series Forecasting (TSF) is key functionality in numerous fields, including in finance, weather services, and energy management.
Foundation models exhibit promising inferencing capabilities in new or unseen data.
We propose a new benchmark, FoundTS, to enable thorough and fair evaluation and comparison of such models.
arXiv Detail & Related papers (2024-10-15T17:23:49Z) - Deep Time Series Models: A Comprehensive Survey and Benchmark [74.28364194333447]
Time series data is of great significance in real-world scenarios.
Recent years have witnessed remarkable breakthroughs in the time series community.
We release Time Series Library (TSLib) as a fair benchmark of deep time series models for diverse analysis tasks.
arXiv Detail & Related papers (2024-07-18T08:31:55Z) - No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance [68.18779562801762]
multimodal models require exponentially more data to achieve linear improvements in downstream "zero-shot" performance.
Our study reveals an exponential need for training data which implies that the key to "zero-shot" generalization capabilities under large-scale training paradigms remains to be found.
arXiv Detail & Related papers (2024-04-04T17:58:02Z) - Chronos: Learning the Language of Time Series [79.38691251254173]
Chronos is a framework for pretrained probabilistic time series models.
We show that Chronos models can leverage time series data from diverse domains to improve zero-shot accuracy on unseen forecasting tasks.
arXiv Detail & Related papers (2024-03-12T16:53:54Z) - A Scalable and Transferable Time Series Prediction Framework for Demand
Forecasting [24.06534393565697]
Time series forecasting is one of the most essential and ubiquitous tasks in many business problems.
We propose Forecasting orchestra (Forchestra), a simple but powerful framework capable of accurately predicting future demand for a diverse range of items.
arXiv Detail & Related papers (2024-02-29T18:01:07Z) - Lag-Llama: Towards Foundation Models for Probabilistic Time Series
Forecasting [54.04430089029033]
We present Lag-Llama, a general-purpose foundation model for time series forecasting based on a decoder-only transformer architecture.
Lag-Llama is pretrained on a large corpus of diverse time series data from several domains, and demonstrates strong zero-shot generalization capabilities.
When fine-tuned on relatively small fractions of such previously unseen datasets, Lag-Llama achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-10-12T12:29:32Z) - Pushing the Limits of Pre-training for Time Series Forecasting in the
CloudOps Domain [54.67888148566323]
We introduce three large-scale time series forecasting datasets from the cloud operations domain.
We show it is a strong zero-shot baseline and benefits from further scaling, both in model and dataset size.
Accompanying these datasets and results is a suite of comprehensive benchmark results comparing classical and deep learning baselines to our pre-trained method.
arXiv Detail & Related papers (2023-10-08T08:09:51Z) - Unified Long-Term Time-Series Forecasting Benchmark [0.6526824510982802]
We present a comprehensive dataset designed explicitly for long-term time-series forecasting.
We incorporate a collection of datasets obtained from diverse, dynamic systems and real-life records.
To determine the most effective model in diverse scenarios, we conduct an extensive benchmarking analysis using classical and state-of-the-art models.
Our findings reveal intriguing performance comparisons among these models, highlighting the dataset-dependent nature of model effectiveness.
arXiv Detail & Related papers (2023-09-27T18:59:00Z) - Monash Time Series Forecasting Archive [6.0617755214437405]
We present a comprehensive time series forecasting archive containing 20 publicly available time series datasets from varied domains.
We characterise the datasets, and identify similarities and differences among them, by conducting a feature analysis.
We present the performance of a set of standard baseline forecasting methods over all datasets across eight error metrics.
arXiv Detail & Related papers (2021-05-14T04:49:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.