TOTEM: TOkenized Time Series EMbeddings for General Time Series Analysis
- URL: http://arxiv.org/abs/2402.16412v2
- Date: Wed, 01 Jan 2025 01:45:01 GMT
- Title: TOTEM: TOkenized Time Series EMbeddings for General Time Series Analysis
- Authors: Sabera Talukder, Yisong Yue, Georgia Gkioxari,
- Abstract summary: This work studies the problem of time series analysis with generalist (or foundation) models.<n>We consider the simple strategy of discretely tokenizing time series data drawn from a myriad of datasets via self-supervision.<n>Our method, TOkenized Time Series EMbeddings (TOTEM), produces such generalist time series models with minimal or no fine-tuning.
- Score: 29.232543319667005
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work studies the problem of time series analysis with generalist (or foundation) models, which are models trained across many data domains. Drawing inspiration from the widespread success of large language models, we consider the simple strategy of discretely tokenizing time series data drawn from a myriad of datasets via self-supervision, then using the fixed tokenization to solve a variety of tasks across many data domains. Canonically, time series models are either trained on a single dataset or built in a task-specific manner (e.g., a forecasting-only model), where many use patches of time as inputs to the model. As such, performant generalist, discrete representation time series models explored across many tasks are of value. Our method, TOkenized Time Series EMbeddings (TOTEM), produces such generalist time series models with minimal or no fine-tuning while exhibiting strong zero-shot performance. We evaluate TOTEM extensively over nearly 500 experiments on three commonly-studied time series tasks with real-world data: imputation (17 baselines, 12 datasets), anomaly detection (19 baselines, 25 datasets), and forecasting (14 baselines, 12 datasets). We conclude that TOTEM matches or outperforms existing state-of-the-art models in both the canonical specialist setting (i.e., training one model on one domain) as well as the generalist setting (i.e., training a single model on many domains), which demonstrates the efficacy of tokenization for general time series analysis. The open-source implementation is available here: https://github.com/SaberaTalukder/TOTEM; a video summary is available here: https://www.youtube.com/watch?v=OqrCpdb6MJk.
Related papers
- GIFT-Eval: A Benchmark For General Time Series Forecasting Model Evaluation [90.53485251837235]
Time series foundation models excel in zero-shot forecasting, handling diverse tasks without explicit training.
GIFT-Eval is a pioneering benchmark aimed at promoting evaluation across diverse datasets.
GIFT-Eval encompasses 23 datasets over 144,000 time series and 177 million data points.
arXiv Detail & Related papers (2024-10-14T11:29:38Z) - Deep Time Series Models: A Comprehensive Survey and Benchmark [74.28364194333447]
Time series data is of great significance in real-world scenarios.
Recent years have witnessed remarkable breakthroughs in the time series community.
We release Time Series Library (TSLib) as a fair benchmark of deep time series models for diverse analysis tasks.
arXiv Detail & Related papers (2024-07-18T08:31:55Z) - Capturing Temporal Components for Time Series Classification [5.70772577110828]
This work introduces a textitcompositional representation learning approach trained on statistically coherent components extracted from sequential data.
Based on a multi-scale change space, an unsupervised approach is proposed to segment the sequential data into chunks with similar statistical properties.
A sequence-based encoder model is trained in a multi-task setting to learn compositional representations from these temporal components for time series classification.
arXiv Detail & Related papers (2024-06-20T16:15:21Z) - PeFAD: A Parameter-Efficient Federated Framework for Time Series Anomaly Detection [51.20479454379662]
We propose a.
Federated Anomaly Detection framework named PeFAD with the increasing privacy concerns.
We conduct extensive evaluations on four real datasets, where PeFAD outperforms existing state-of-the-art baselines by up to 28.74%.
arXiv Detail & Related papers (2024-06-04T13:51:08Z) - NuwaTS: a Foundation Model Mending Every Incomplete Time Series [24.768755438620666]
We present textbfNuwaTS, a novel framework that repurposes Pre-trained Language Models for general time series imputation.
NuwaTS can be applied to impute missing data across any domain.
We show that NuwaTS generalizes to other time series tasks, such as forecasting.
arXiv Detail & Related papers (2024-05-24T07:59:02Z) - UniCL: A Universal Contrastive Learning Framework for Large Time Series Models [18.005358506435847]
Time-series analysis plays a pivotal role across a range of critical applications, from finance to healthcare.
Traditional supervised learning methods first annotate extensive labels for time-series data in each task.
This paper introduces UniCL, a universal and scalable contrastive learning framework designed for pretraining time-series foundation models.
arXiv Detail & Related papers (2024-05-17T07:47:11Z) - Chronos: Learning the Language of Time Series [79.38691251254173]
Chronos is a framework for pretrained probabilistic time series models.
We show that Chronos models can leverage time series data from diverse domains to improve zero-shot accuracy on unseen forecasting tasks.
arXiv Detail & Related papers (2024-03-12T16:53:54Z) - MOMENT: A Family of Open Time-series Foundation Models [19.0845213853369]
We introduce MOMENT, a family of open-source foundation models for general-purpose time series analysis.
We compile a collection of public time series, called the Time series Pile, and systematically tackle time series-specific challenges.
We build on recent work to design a benchmark to evaluate time series foundation models on diverse tasks and datasets in limited supervision settings.
arXiv Detail & Related papers (2024-02-06T10:48:46Z) - Unified Training of Universal Time Series Forecasting Transformers [104.56318980466742]
We present a Masked-based Universal Time Series Forecasting Transformer (Moirai)
Moirai is trained on our newly introduced Large-scale Open Time Series Archive (LOTSA) featuring over 27B observations across nine domains.
Moirai achieves competitive or superior performance as a zero-shot forecaster when compared to full-shot models.
arXiv Detail & Related papers (2024-02-04T20:00:45Z) - Timer: Generative Pre-trained Transformers Are Large Time Series Models [83.03091523806668]
This paper aims at the early development of large time series models (LTSM)
During pre-training, we curate large-scale datasets with up to 1 billion time points.
To meet diverse application needs, we convert forecasting, imputation, and anomaly detection of time series into a unified generative task.
arXiv Detail & Related papers (2024-02-04T06:55:55Z) - Large Pre-trained time series models for cross-domain Time series analysis tasks [20.228846068418765]
We propose a novel method of textitadaptive segmentation that automatically identifies optimal dataset-specific segmentation strategy during pre-training.
This enables LPTM to perform similar to or better than domain-specific state-of-art model when fine-tuned to different downstream time-series analysis tasks and under zero-shot settings.
arXiv Detail & Related papers (2023-11-19T20:16:16Z) - Temporal Treasure Hunt: Content-based Time Series Retrieval System for
Discovering Insights [34.1973242428317]
Time series data is ubiquitous across various domains such as finance, healthcare, and manufacturing.
The ability to perform Content-based Time Series Retrieval (CTSR) is crucial for identifying unknown time series examples.
We introduce a CTSR benchmark dataset that comprises time series data from a variety of domains.
arXiv Detail & Related papers (2023-11-05T04:12:13Z) - Pushing the Limits of Pre-training for Time Series Forecasting in the
CloudOps Domain [54.67888148566323]
We introduce three large-scale time series forecasting datasets from the cloud operations domain.
We show it is a strong zero-shot baseline and benefits from further scaling, both in model and dataset size.
Accompanying these datasets and results is a suite of comprehensive benchmark results comparing classical and deep learning baselines to our pre-trained method.
arXiv Detail & Related papers (2023-10-08T08:09:51Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Few-Shot Forecasting of Time-Series with Heterogeneous Channels [4.635820333232681]
We develop a model composed of permutation-invariant deep set-blocks which incorporate a temporal embedding.
We show through experiments that our model provides a good generalization, outperforming baselines carried over from simpler scenarios.
arXiv Detail & Related papers (2022-04-07T14:02:15Z) - Novel Features for Time Series Analysis: A Complex Networks Approach [62.997667081978825]
Time series data are ubiquitous in several domains as climate, economics and health care.
Recent conceptual approach relies on time series mapping to complex networks.
Network analysis can be used to characterize different types of time series.
arXiv Detail & Related papers (2021-10-11T13:46:28Z) - Time Series Analysis via Network Science: Concepts and Algorithms [62.997667081978825]
This review provides a comprehensive overview of existing mapping methods for transforming time series into networks.
We describe the main conceptual approaches, provide authoritative references and give insight into their advantages and limitations in a unified notation and language.
Although still very recent, this research area has much potential and with this survey we intend to pave the way for future research on the topic.
arXiv Detail & Related papers (2021-10-11T13:33:18Z) - Simple multi-dataset detection [83.9604523643406]
We present a simple method for training a unified detector on multiple large-scale datasets.
We show how to automatically integrate dataset-specific outputs into a common semantic taxonomy.
Our approach does not require manual taxonomy reconciliation.
arXiv Detail & Related papers (2021-02-25T18:55:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.