Tiny Time Mixers (TTMs): Fast Pre-trained Models for Enhanced Zero/Few-Shot Forecasting of Multivariate Time Series
- URL: http://arxiv.org/abs/2401.03955v8
- Date: Thu, 07 Nov 2024 15:07:22 GMT
- Title: Tiny Time Mixers (TTMs): Fast Pre-trained Models for Enhanced Zero/Few-Shot Forecasting of Multivariate Time Series
- Authors: Vijay Ekambaram, Arindam Jati, Pankaj Dayama, Sumanta Mukherjee, Nam H. Nguyen, Wesley M. Gifford, Chandra Reddy, Jayant Kalagnanam,
- Abstract summary: We introduce Tiny Time Mixers (TTM), a compact model with effective transfer learning capabilities, trained exclusively on public TS datasets.
TTM incorporates innovations like adaptive patching, diverse resolution sampling, and resolution prefix tuning to handle pre-training on varied dataset resolutions.
It outperforms existing popular benchmarks in zero/few-shot forecasting by (4-40%), while reducing computational requirements significantly.
- Score: 11.635608108358575
- License:
- Abstract: Large pre-trained models excel in zero/few-shot learning for language and vision tasks but face challenges in multivariate time series (TS) forecasting due to diverse data characteristics. Consequently, recent research efforts have focused on developing pre-trained TS forecasting models. These models, whether built from scratch or adapted from large language models (LLMs), excel in zero/few-shot forecasting tasks. However, they are limited by slow performance, high computational demands, and neglect of cross-channel and exogenous correlations. To address this, we introduce Tiny Time Mixers (TTM), a compact model (starting from 1M parameters) with effective transfer learning capabilities, trained exclusively on public TS datasets. TTM, based on the light-weight TSMixer architecture, incorporates innovations like adaptive patching, diverse resolution sampling, and resolution prefix tuning to handle pre-training on varied dataset resolutions with minimal model capacity. Additionally, it employs multi-level modeling to capture channel correlations and infuse exogenous signals during fine-tuning. TTM outperforms existing popular benchmarks in zero/few-shot forecasting by (4-40%), while reducing computational requirements significantly. Moreover, TTMs are lightweight and can be executed even on CPU-only machines, enhancing usability and fostering wider adoption in resource-constrained environments. The model weights for reproducibility and research use are available at https://huggingface.co/ibm/ttm-research-r2/, while enterprise-use weights under the Apache license can be accessed as follows: the initial TTM-Q variant at https://huggingface.co/ibm-granite/granite-timeseries-ttm-r1, and the latest variants (TTM-B, TTM-E, TTM-A) weights are available at https://huggingface.co/ibm-granite/granite-timeseries-ttm-r2.
Related papers
- FM-TS: Flow Matching for Time Series Generation [71.31148785577085]
We introduce FM-TS, a rectified Flow Matching-based framework for Time Series generation.
FM-TS is more efficient in terms of training and inference.
We have achieved superior performance in solar forecasting and MuJoCo imputation tasks.
arXiv Detail & Related papers (2024-11-12T03:03:23Z) - A Mamba Foundation Model for Time Series Forecasting [13.593170999506889]
We introduce TSMamba, a linear-complexity foundation model for time series forecasting built on the Mamba architecture.
The model captures temporal dependencies through both forward and backward Mamba encoders, achieving high prediction accuracy.
It also achieves competitive or superior full-shot performance compared to task-specific prediction models.
arXiv Detail & Related papers (2024-11-05T09:34:05Z) - Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts [25.503695417712997]
Time-MoE is a scalable and unified architecture designed to pre-train larger, more capable forecasting foundation models.
Time-MoE enhances computational efficiency by activating only a subset of networks for each prediction.
For the first time, we scaled a time series foundation model up to 2.4 billion parameters, achieving significantly improved forecasting precision.
arXiv Detail & Related papers (2024-09-24T12:42:18Z) - EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - UniTS: A Unified Multi-Task Time Series Model [31.675845788410246]
UniTS is a unified multi-task time series model that integrates predictive and generative tasks into a single framework.
UniTS is tested on 38 datasets across human activity sensors, healthcare, engineering, and finance.
arXiv Detail & Related papers (2024-02-29T21:25:58Z) - Timer: Generative Pre-trained Transformers Are Large Time Series Models [83.03091523806668]
This paper aims at the early development of large time series models (LTSM)
During pre-training, we curate large-scale datasets with up to 1 billion time points.
To meet diverse application needs, we convert forecasting, imputation, and anomaly detection of time series into a unified generative task.
arXiv Detail & Related papers (2024-02-04T06:55:55Z) - Efficient GPT Model Pre-training using Tensor Train Matrix
Representation [65.96485282393361]
Large-scale transformer models feature billions of parameters, leading to difficulties in their deployment and prohibitive training costs from scratch.
To reduce the number of parameters in the GPT-2 architecture, we replace the matrices of fully-connected layers with the corresponding Train Matrix(TTM) structure.
The resulting GPT-based model stores up to 40% fewer parameters, showing the perplexity comparable to the original model.
arXiv Detail & Related papers (2023-06-05T08:38:25Z) - Incremental Online Learning Algorithms Comparison for Gesture and Visual
Smart Sensors [68.8204255655161]
This paper compares four state-of-the-art algorithms in two real applications: gesture recognition based on accelerometer data and image classification.
Our results confirm these systems' reliability and the feasibility of deploying them in tiny-memory MCUs.
arXiv Detail & Related papers (2022-09-01T17:05:20Z) - Neural forecasting at scale [8.245069318446415]
We study the problem of efficiently scaling ensemble-based deep neural networks for time series (TS) forecasting on a large set of time series.
Our model addresses the practical limitations of related models, reducing the training time by half and memory requirement by a factor of 5.
arXiv Detail & Related papers (2021-09-20T17:22:40Z) - Convolutional Tensor-Train LSTM for Spatio-temporal Learning [116.24172387469994]
We propose a higher-order LSTM model that can efficiently learn long-term correlations in the video sequence.
This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time.
Our results achieve state-of-the-art performance-art in a wide range of applications and datasets.
arXiv Detail & Related papers (2020-02-21T05:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.