Examining the Effect of Pre-training on Time Series Classification
- URL: http://arxiv.org/abs/2309.05256v1
- Date: Mon, 11 Sep 2023 06:26:57 GMT
- Title: Examining the Effect of Pre-training on Time Series Classification
- Authors: Jiashu Pu, Shiwei Zhao, Ling Cheng, Yongzhu Chang, Runze Wu, Tangjie
Lv, Rongsheng Zhang
- Abstract summary: This study investigates the impact of pre-training followed by fine-tuning on the fine-tuning process.
We conducted a thorough examination of 150 classification datasets.
We find that pre-training can only help improve the optimization process for models that fit the data poorly.
Adding more pre-training data does not improve generalization, but it can strengthen the advantage of pre-training on the original data volume.
- Score: 21.38211396933795
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Although the pre-training followed by fine-tuning paradigm is used
extensively in many fields, there is still some controversy surrounding the
impact of pre-training on the fine-tuning process. Currently, experimental
findings based on text and image data lack consensus. To delve deeper into the
unsupervised pre-training followed by fine-tuning paradigm, we have extended
previous research to a new modality: time series. In this study, we conducted a
thorough examination of 150 classification datasets derived from the Univariate
Time Series (UTS) and Multivariate Time Series (MTS) benchmarks. Our analysis
reveals several key conclusions. (i) Pre-training can only help improve the
optimization process for models that fit the data poorly, rather than those
that fit the data well. (ii) Pre-training does not exhibit the effect of
regularization when given sufficient training time. (iii) Pre-training can only
speed up convergence if the model has sufficient ability to fit the data. (iv)
Adding more pre-training data does not improve generalization, but it can
strengthen the advantage of pre-training on the original data volume, such as
faster convergence. (v) While both the pre-training task and the model
structure determine the effectiveness of the paradigm on a given dataset, the
model structure plays a more significant role.
Related papers
- Amuro & Char: Analyzing the Relationship between Pre-Training and Fine-Tuning of Large Language Models [17.288865972774587]
We investigate the relationship between pre-training and fine-tuning by fine-tuning multiple intermediate pre-trained model checkpoints.
Our results on 18 datasets suggest that pre-training improves the model in a latent way that unveils after fine-tuning.
arXiv Detail & Related papers (2024-08-13T06:28:43Z) - Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - A Supervised Contrastive Learning Pretrain-Finetune Approach for Time
Series [15.218841180577135]
We introduce a novel pretraining procedure that leverages supervised contrastive learning to distinguish features within each pretraining dataset.
We then propose a fine-tuning procedure designed to enhance the accurate prediction of the target data by aligning it more closely with the learned dynamics of the pretraining datasets.
arXiv Detail & Related papers (2023-11-21T02:06:52Z) - Efficient Grammatical Error Correction Via Multi-Task Training and
Optimized Training Schedule [55.08778142798106]
We propose auxiliary tasks that exploit the alignment between the original and corrected sentences.
We formulate each task as a sequence-to-sequence problem and perform multi-task training.
We find that the order of datasets used for training and even individual instances within a dataset may have important effects on the final performance.
arXiv Detail & Related papers (2023-11-20T14:50:12Z) - Better with Less: A Data-Active Perspective on Pre-Training Graph Neural
Networks [39.71761440499148]
Pre-training on graph neural networks (GNNs) aims to learn transferable knowledge for downstream tasks with unlabeled data.
We propose a better-with-less framework for graph pre-training: fewer, but carefully chosen data are fed into a GNN model.
Experiment results show that the proposed APT is able to obtain an efficient pre-training model with fewer training data and better downstream performance.
arXiv Detail & Related papers (2023-11-02T07:09:59Z) - Understanding and Mitigating the Label Noise in Pre-training on
Downstream Tasks [91.15120211190519]
This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks.
We propose a light-weight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise.
arXiv Detail & Related papers (2023-09-29T06:18:15Z) - Self-Distillation for Further Pre-training of Transformers [83.84227016847096]
We propose self-distillation as a regularization for a further pre-training stage.
We empirically validate the efficacy of self-distillation on a variety of benchmark datasets for image and text classification tasks.
arXiv Detail & Related papers (2022-09-30T02:25:12Z) - Dataset Pruning: Reducing Training Data by Examining Generalization
Influence [30.30255670341501]
Do all training data contribute to model's performance?
How to construct a smallest subset from the entire training data as a proxy training set without significantly sacrificing the model's performance?
arXiv Detail & Related papers (2022-05-19T05:36:35Z) - Improved Fine-tuning by Leveraging Pre-training Data: Theory and
Practice [52.11183787786718]
Fine-tuning a pre-trained model on the target data is widely used in many deep learning applications.
Recent studies have empirically shown that training from scratch has the final performance that is no worse than this pre-training strategy.
We propose a novel selection strategy to select a subset from pre-training data to help improve the generalization on the target task.
arXiv Detail & Related papers (2021-11-24T06:18:32Z) - Self-Supervised Pretraining Improves Self-Supervised Pretraining [83.1423204498361]
Self-supervised pretraining requires expensive and lengthy computation, large amounts of data, and is sensitive to data augmentation.
This paper explores Hierarchical PreTraining (HPT), which decreases convergence time and improves accuracy by initializing the pretraining process with an existing pretrained model.
We show HPT converges up to 80x faster, improves accuracy across tasks, and improves the robustness of the self-supervised pretraining process to changes in the image augmentation policy or amount of pretraining data.
arXiv Detail & Related papers (2021-03-23T17:37:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.