One Fits All: Universal Time Series Analysis by Pretrained LM and
Specially Designed Adaptors
- URL: http://arxiv.org/abs/2311.14782v1
- Date: Fri, 24 Nov 2023 16:32:47 GMT
- Title: One Fits All: Universal Time Series Analysis by Pretrained LM and
Specially Designed Adaptors
- Authors: Tian Zhou, Peisong Niu, Xue Wang, Liang Sun, Rong Jin
- Abstract summary: We introduce four unique adapters, designed specifically for downstream tasks based on the pre-trained model.
These adapters are further enhanced with efficient parameter tuning, resulting in superior performance compared to all state-of-the-art methods.
- Score: 23.292260325891032
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Despite the impressive achievements of pre-trained models in the fields of
natural language processing (NLP) and computer vision (CV), progress in the
domain of time series analysis has been limited. In contrast to NLP and CV,
where a single model can handle various tasks, time series analysis still
relies heavily on task-specific methods for activities such as classification,
anomaly detection, forecasting, and few-shot learning. The primary obstacle to
developing a pre-trained model for time series analysis is the scarcity of
sufficient training data. In our research, we overcome this obstacle by
utilizing pre-trained models from language or CV, which have been trained on
billions of data points, and apply them to time series analysis. We assess the
effectiveness of the pre-trained transformer model in two ways. Initially, we
maintain the original structure of the self-attention and feedforward layers in
the residual blocks of the pre-trained language or image model, using the
Frozen Pre-trained Transformer (FPT) for time series analysis with the addition
of projection matrices for input and output. Additionally, we introduce four
unique adapters, designed specifically for downstream tasks based on the
pre-trained model, including forecasting and anomaly detection. These adapters
are further enhanced with efficient parameter tuning, resulting in superior
performance compared to all state-of-the-art methods.Our comprehensive
experimental studies reveal that (a) the simple FPT achieves top-tier
performance across various time series analysis tasks; and (b) fine-tuning the
FPT with the custom-designed adapters can further elevate its performance,
outshining specialized task-specific models.
Related papers
- Generative Pretrained Hierarchical Transformer for Time Series Forecasting [3.739587363053192]
We propose a novel generative pretrained hierarchical transformer architecture for forecasting, named textbfGPHT.
We conduct sufficient experiments on eight datasets with mainstream self-supervised pretraining models and supervised models.
The results demonstrated that GPHT surpasses the baseline models across various fine-tuning and zero/few-shot learning settings in the traditional long-term forecasting task.
arXiv Detail & Related papers (2024-02-26T11:54:54Z) - Timer: Generative Pre-trained Transformers Are Large Time Series Models [83.03091523806668]
This paper aims at the early development of large time series models (LTSM)
During pre-training, we curate large-scale datasets with up to 1 billion time points.
To meet diverse application needs, we convert forecasting, imputation, and anomaly detection of time series into a unified generative task.
arXiv Detail & Related papers (2024-02-04T06:55:55Z) - Large Pre-trained time series models for cross-domain Time series analysis tasks [20.228846068418765]
We propose a novel method of textitadaptive segmentation that automatically identifies optimal dataset-specific segmentation strategy during pre-training.
This enables LPTM to perform similar to or better than domain-specific state-of-art model when fine-tuned to different downstream time-series analysis tasks and under zero-shot settings.
arXiv Detail & Related papers (2023-11-19T20:16:16Z) - How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression? [92.90857135952231]
Transformers pretrained on diverse tasks exhibit remarkable in-context learning (ICL) capabilities.
We study ICL in one of its simplest setups: pretraining a linearly parameterized single-layer linear attention model for linear regression.
arXiv Detail & Related papers (2023-10-12T15:01:43Z) - Toward a Foundation Model for Time Series Data [34.1973242428317]
A foundation model is a machine learning model trained on a large and diverse set of data.
We develop an effective time series foundation model by leveraging unlabeled samples from multiple domains.
arXiv Detail & Related papers (2023-10-05T21:44:50Z) - Examining the Effect of Pre-training on Time Series Classification [21.38211396933795]
This study investigates the impact of pre-training followed by fine-tuning on the fine-tuning process.
We conducted a thorough examination of 150 classification datasets.
We find that pre-training can only help improve the optimization process for models that fit the data poorly.
Adding more pre-training data does not improve generalization, but it can strengthen the advantage of pre-training on the original data volume.
arXiv Detail & Related papers (2023-09-11T06:26:57Z) - Approximated Prompt Tuning for Vision-Language Pre-trained Models [54.326232586461614]
In vision-language pre-trained models, prompt tuning often requires a large number of learnable tokens to bridge the gap between the pre-training and downstream tasks.
We propose a novel Approximated Prompt Tuning (APT) approach towards efficient VL transfer learning.
arXiv Detail & Related papers (2023-06-27T05:43:47Z) - One Fits All:Power General Time Series Analysis by Pretrained LM [23.292260325891032]
We show that pre-trained models on natural language or images can lead to a comparable or state-of-the-art performance in all main time series analysis tasks.
Our results demonstrate that pre-trained models on natural language or images can lead to a comparable or state-of-the-art performance in all main time series analysis tasks.
arXiv Detail & Related papers (2023-02-23T11:37:39Z) - Self-Distillation for Further Pre-training of Transformers [83.84227016847096]
We propose self-distillation as a regularization for a further pre-training stage.
We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks.
arXiv Detail & Related papers (2022-09-30T02:25:12Z) - A Generative Language Model for Few-shot Aspect-Based Sentiment Analysis [90.24921443175514]
We focus on aspect-based sentiment analysis, which involves extracting aspect term, category, and predicting their corresponding polarities.
We propose to reformulate the extraction and prediction tasks into the sequence generation task, using a generative language model with unidirectional attention.
Our approach outperforms the previous state-of-the-art (based on BERT) on average performance by a large margins in few-shot and full-shot settings.
arXiv Detail & Related papers (2022-04-11T18:31:53Z) - Pretrained Transformers as Universal Computation Engines [105.00539596788127]
We investigate the capability of a transformer pretrained on natural language to generalize to other modalities with minimal finetuning.
We study finetuning it on a variety of sequence classification tasks spanning numerical computation, vision, and protein fold prediction.
We find that such pretraining enables FPT to generalize in zero-shot to these modalities, matching the performance of a transformer fully trained on these tasks.
arXiv Detail & Related papers (2021-03-09T06:39:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.