Related papers: Scaling-laws for Large Time-series Models

Scaling-laws for Large Time-series Models

URL: http://arxiv.org/abs/2405.13867v1
Date: Wed, 22 May 2024 17:48:17 GMT
Title: Scaling-laws for Large Time-series Models
Authors: Thomas D. P. Edwards, James Alvey, Justin Alsing, Nam H. Nguyen, Benjamin D. Wandelt,
Abstract summary: Time series forecasting shares a similar sequential structure to language, and is amenable to large-scale transformer architectures. We show that foundational decoder-only time series transformer models exhibit analogous scaling-behavior to LLMs. We assemble a large corpus of heterogenous time series data on which to train, and establish, for the first time, power-law scaling relations with respect to parameter count, dataset size, and training compute.
Score: 2.0671213754662343
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Scaling laws for large language models (LLMs) have provided useful guidance on how to train ever larger models for predictable performance gains. Time series forecasting shares a similar sequential structure to language, and is amenable to large-scale transformer architectures. Here we show that foundational decoder-only time series transformer models exhibit analogous scaling-behavior to LLMs, while architectural details (aspect ratio and number of heads) have a minimal effect over broad ranges. We assemble a large corpus of heterogenous time series data on which to train, and establish, for the first time, power-law scaling relations with respect to parameter count, dataset size, and training compute, spanning five orders of magnitude.

Related papers

A Mamba Foundation Model for Time Series Forecasting [13.593170999506889]
We introduce TSMamba, a linear-complexity foundation model for time series forecasting built on the Mamba architecture. The model captures temporal dependencies through both forward and backward Mamba encoders, achieving high prediction accuracy. It also achieves competitive or superior full-shot performance compared to task-specific prediction models.
arXiv Detail & Related papers (2024-11-05T09:34:05Z)
UniTST: Effectively Modeling Inter-Series and Intra-Series Dependencies for Multivariate Time Series Forecasting [98.12558945781693]
We propose a transformer-based model UniTST containing a unified attention mechanism on the flattened patch tokens. Although our proposed model employs a simple architecture, it offers compelling performance as shown in our experiments on several datasets for time series forecasting.
arXiv Detail & Related papers (2024-06-07T14:39:28Z)
Scaling Law for Time Series Forecasting [8.967263259533036]
Scaling law that rewards large datasets, complex models and enhanced data granularity has been observed in various fields of deep learning. Yet, studies on time series forecasting have cast doubt on scaling behaviors of deep learning methods for time series forecasting. We propose a theory for scaling law for time series forecasting that can explain these seemingly abnormal behaviors.
arXiv Detail & Related papers (2024-05-24T00:46:27Z)
Chronos: Learning the Language of Time Series [79.38691251254173]
Chronos is a framework for pretrained probabilistic time series models. We show that Chronos models can leverage time series data from diverse domains to improve zero-shot accuracy on unseen forecasting tasks.
arXiv Detail & Related papers (2024-03-12T16:53:54Z)
Unified Training of Universal Time Series Forecasting Transformers [104.56318980466742]
We present a Masked-based Universal Time Series Forecasting Transformer (Moirai) Moirai is trained on our newly introduced Large-scale Open Time Series Archive (LOTSA) featuring over 27B observations across nine domains. Moirai achieves competitive or superior performance as a zero-shot forecaster when compared to full-shot models.
arXiv Detail & Related papers (2024-02-04T20:00:45Z)
Timer: Generative Pre-trained Transformers Are Large Time Series Models [83.03091523806668]
This paper aims at the early development of large time series models (LTSM) During pre-training, we curate large-scale datasets with up to 1 billion time points. To meet diverse application needs, we convert forecasting, imputation, and anomaly detection of time series into a unified generative task.
arXiv Detail & Related papers (2024-02-04T06:55:55Z)
Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting [54.04430089029033]
We present Lag-Llama, a general-purpose foundation model for time series forecasting based on a decoder-only transformer architecture. Lag-Llama is pretrained on a large corpus of diverse time series data from several domains, and demonstrates strong zero-shot generalization capabilities. When fine-tuned on relatively small fractions of such previously unseen datasets, Lag-Llama achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-10-12T12:29:32Z)
Pushing the Limits of Pre-training for Time Series Forecasting in the CloudOps Domain [54.67888148566323]
We introduce three large-scale time series forecasting datasets from the cloud operations domain. We show it is a strong zero-shot baseline and benefits from further scaling, both in model and dataset size. Accompanying these datasets and results is a suite of comprehensive benchmark results comparing classical and deep learning baselines to our pre-trained method.
arXiv Detail & Related papers (2023-10-08T08:09:51Z)
TACTiS-2: Better, Faster, Simpler Attentional Copulas for Multivariate Time Series [57.4208255711412]
Building on copula theory, we propose a simplified objective for the recently-introduced transformer-based attentional copulas (TACTiS) We show that the resulting model has significantly better training dynamics and achieves state-of-the-art performance across diverse real-world forecasting tasks.
arXiv Detail & Related papers (2023-10-02T16:45:19Z)
DuETT: Dual Event Time Transformer for Electronic Health Records [14.520791492631114]
We introduce the DuETT architecture, an extension of Transformers designed to attend over both time and event type dimensions. DuETT uses an aggregated input where sparse time series are transformed into a regular sequence with fixed length. Our model outperforms state-of-the-art deep learning models on multiple downstream tasks from the MIMIC-IV and PhysioNet-2012 EHR datasets.
arXiv Detail & Related papers (2023-04-25T17:47:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.