PT-Tuning: Bridging the Gap between Time Series Masked Reconstruction
and Forecasting via Prompt Token Tuning
- URL: http://arxiv.org/abs/2311.03768v1
- Date: Tue, 7 Nov 2023 07:11:27 GMT
- Title: PT-Tuning: Bridging the Gap between Time Series Masked Reconstruction
and Forecasting via Prompt Token Tuning
- Authors: Hao Liu, Jinrui Gan, Xiaoxuan Fan, Yi Zhang, Chuanxian Luo, Jing
Zhang, Guangxin Jiang, Yucheng Qian, Changwei Zhao, Huan Ma, Zhenyu Guo
- Abstract summary: Self-supervised learning has been actively studied in time series domain recently, especially for masked reconstruction.
Most of these methods follow the "Pre-training + Fine-tuning" paradigm in which a new decoder replaces the pre-trained decoder.
We propose a simple yet effective prompt token tuning (PT-Tuning) paradigm, in which all pre-trained parameters are frozen and only a few trainable prompt tokens are added to extended mask tokens.
- Score: 14.332279447231416
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised learning has been actively studied in time series domain
recently, especially for masked reconstruction. Most of these methods follow
the "Pre-training + Fine-tuning" paradigm in which a new decoder replaces the
pre-trained decoder to fit for a specific downstream task, leading to
inconsistency of upstream and downstream tasks. In this paper, we first point
out that the unification of task objectives and adaptation for task difficulty
are critical for bridging the gap between time series masked reconstruction and
forecasting. By reserving the pre-trained mask token during fine-tuning stage,
the forecasting task can be taken as a special case of masked reconstruction,
where the future values are masked and reconstructed based on history values.
It guarantees the consistency of task objectives but there is still a gap in
task difficulty. Because masked reconstruction can utilize contextual
information while forecasting can only use historical information to
reconstruct. To further mitigate the existed gap, we propose a simple yet
effective prompt token tuning (PT-Tuning) paradigm, in which all pre-trained
parameters are frozen and only a few trainable prompt tokens are added to
extended mask tokens in element-wise manner. Extensive experiments on
real-world datasets demonstrate the superiority of our proposed paradigm with
state-of-the-art performance compared to representation learning and end-to-end
supervised forecasting methods.
Related papers
- Continual Diffusion with STAMINA: STack-And-Mask INcremental Adapters [67.28751868277611]
Recent work has demonstrated ability to customize text-to-image diffusion models to multiple, fine-grained concepts in a sequential manner.
We show that capacity to learn new tasks reaches saturation over longer sequences.
We introduce a novel method, STack-And-Mask INcremental Adapters (STAMINA), which is composed of low-ranked attention-masked adapters and customized tokens.
arXiv Detail & Related papers (2023-11-30T18:04:21Z) - Learning to Mask and Permute Visual Tokens for Vision Transformer
Pre-Training [59.923672191632065]
We propose a new self-supervised pre-training approach, named Masked and Permuted Vision Transformer (MaPeT)
MaPeT employs autoregressive and permuted predictions to capture intra-patch dependencies.
Our results demonstrate that MaPeT achieves competitive performance on ImageNet.
arXiv Detail & Related papers (2023-06-12T18:12:19Z) - SimMTM: A Simple Pre-Training Framework for Masked Time-Series Modeling [82.69579113377192]
SimMTM is a simple pre-training framework for Masked Time-series Modeling.
SimMTM recovers masked time points by the weighted aggregation of multiple neighbors outside the manifold.
SimMTM achieves state-of-the-art fine-tuning performance compared to the most advanced time series pre-training methods.
arXiv Detail & Related papers (2023-02-02T04:12:29Z) - Ti-MAE: Self-Supervised Masked Time Series Autoencoders [16.98069693152999]
We propose a novel framework named Ti-MAE, in which the input time series are assumed to follow an integrate distribution.
Ti-MAE randomly masks out embedded time series data and learns an autoencoder to reconstruct them at the point-level.
Experiments on several public real-world datasets demonstrate that our framework of masked autoencoding could learn strong representations directly from the raw data.
arXiv Detail & Related papers (2023-01-21T03:20:23Z) - Task Residual for Tuning Vision-Language Models [69.22958802711017]
We propose a new efficient tuning approach for vision-language models (VLMs) named Task Residual Tuning (TaskRes)
TaskRes explicitly decouples the prior knowledge of the pre-trained models and new knowledge regarding a target task.
The proposed TaskRes is simple yet effective, which significantly outperforms previous methods on 11 benchmark datasets.
arXiv Detail & Related papers (2022-11-18T15:09:03Z) - Leveraging Time Irreversibility with Order-Contrastive Pre-training [3.1848820580333737]
We explore an "order-contrastive" method for self-supervised pre-training on longitudinal data.
We prove a finite-sample guarantee for the downstream error of a representation learned with order-contrastive pre-training.
Our results indicate that pre-training methods designed for particular classes of distributions and downstream tasks can improve the performance of self-supervised learning.
arXiv Detail & Related papers (2021-11-04T02:56:52Z) - Bi-Granularity Contrastive Learning for Post-Training in Few-Shot Scene [10.822477939237459]
We propose contrastive masked language modeling (CMLM) for post-training to integrate both token-level and sequence-level contrastive learnings.
CMLM surpasses several recent post-training methods in few-shot settings without the need for data augmentation.
arXiv Detail & Related papers (2021-06-04T08:17:48Z) - Train No Evil: Selective Masking for Task-Guided Pre-Training [97.03615486457065]
We propose a three-stage framework by adding a task-guided pre-training stage with selective masking between general pre-training and fine-tuning.
We show that our method can achieve comparable or even better performance with less than 50% of cost.
arXiv Detail & Related papers (2020-04-21T03:14:22Z) - Ternary Feature Masks: zero-forgetting for task-incremental learning [68.34518408920661]
We propose an approach without any forgetting to continual learning for the task-aware regime.
By using ternary masks we can upgrade a model to new tasks, reusing knowledge from previous tasks while not forgetting anything about them.
Our method outperforms current state-of-the-art while reducing memory overhead in comparison to weight-based approaches.
arXiv Detail & Related papers (2020-01-23T18:08:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.