A Comparative Study of Pre-training and Self-training
- URL: http://arxiv.org/abs/2409.02751v1
- Date: Wed, 4 Sep 2024 14:30:13 GMT
- Title: A Comparative Study of Pre-training and Self-training
- Authors: Yiheng Wang, Jiayu Lin, Zuoquan Lin,
- Abstract summary: We propose an ensemble method to empirical study all feasible training paradigms combining pre-training, self-training, and fine-tuning.
We conduct experiments on six datasets, four data augmentation, and imbalanced data for sentiment analysis and natural language inference tasks.
Our findings confirm that the pre-training and fine-tuning paradigm yields the best overall performances.
- Score: 0.40964539027092917
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-training and self-training are two approaches to semi-supervised learning. The comparison between pre-training and self-training has been explored. However, the previous works led to confusing findings: self-training outperforms pre-training experienced on some tasks in computer vision, and contrarily, pre-training outperforms self-training experienced on some tasks in natural language processing, under certain conditions of incomparable settings. We propose, comparatively and exhaustively, an ensemble method to empirical study all feasible training paradigms combining pre-training, self-training, and fine-tuning within consistent foundational settings comparable to data augmentation. We conduct experiments on six datasets, four data augmentation, and imbalanced data for sentiment analysis and natural language inference tasks. Our findings confirm that the pre-training and fine-tuning paradigm yields the best overall performances. Moreover, self-training offers no additional benefits when combined with semi-supervised pre-training.
Related papers
- Self-training Language Models for Arithmetic Reasoning [0.0]
We explore the potential of improving models' reasoning capabilities without new data.
We find that models can substantially improve in both single-round (offline) and online self-training.
arXiv Detail & Related papers (2024-07-11T11:06:05Z) - A Supervised Contrastive Learning Pretrain-Finetune Approach for Time
Series [15.218841180577135]
We introduce a novel pretraining procedure that leverages supervised contrastive learning to distinguish features within each pretraining dataset.
We then propose a fine-tuning procedure designed to enhance the accurate prediction of the target data by aligning it more closely with the learned dynamics of the pretraining datasets.
arXiv Detail & Related papers (2023-11-21T02:06:52Z) - Improving In-Context Few-Shot Learning via Self-Supervised Training [48.801037246764935]
We propose to use self-supervision in an intermediate training stage between pretraining and downstream few-shot usage.
We find that the intermediate self-supervision stage produces models that outperform strong baselines.
arXiv Detail & Related papers (2022-05-03T18:01:07Z) - On the Transferability of Pre-trained Language Models: A Study from
Artificial Datasets [74.11825654535895]
Pre-training language models (LMs) on large-scale unlabeled text data makes the model much easier to achieve exceptional downstream performance.
We study what specific traits in the pre-training data, other than the semantics, make a pre-trained LM superior to their counterparts trained from scratch on downstream tasks.
arXiv Detail & Related papers (2021-09-08T10:39:57Z) - Better Self-training for Image Classification through Self-supervision [3.492636597449942]
Self-supervision is learning without manual supervision by solving an automatically-generated pretext task.
This paper investigates three ways of incorporating self-supervision into self-training to improve accuracy in image classification.
arXiv Detail & Related papers (2021-09-02T08:24:41Z) - How Well Self-Supervised Pre-Training Performs with Streaming Data? [73.5362286533602]
In real-world scenarios where data are collected in a streaming fashion, the joint training scheme is usually storage-heavy and time-consuming.
It is unclear how well sequential self-supervised pre-training performs with streaming data.
We find sequential self-supervised learning exhibits almost the same performance as the joint training when the distribution shifts within streaming data are mild.
arXiv Detail & Related papers (2021-04-25T06:56:48Z) - Self-training Improves Pre-training for Natural Language Understanding [63.78927366363178]
We study self-training as another way to leverage unlabeled data through semi-supervised learning.
We introduce SentAugment, a data augmentation method which computes task-specific query embeddings from labeled data.
Our approach leads to scalable and effective self-training with improvements of up to 2.6% on standard text classification benchmarks.
arXiv Detail & Related papers (2020-10-05T17:52:25Z) - Multi-Stage Influence Function [97.19210942277354]
We develop a multi-stage influence function score to track predictions from a finetuned model all the way back to the pretraining data.
We study two different scenarios with the pretrained embeddings fixed or updated in the finetuning tasks.
arXiv Detail & Related papers (2020-07-17T16:03:11Z) - Fine-Tuning Pretrained Language Models: Weight Initializations, Data
Orders, and Early Stopping [62.78338049381917]
Fine-tuning pretrained contextual word embedding models to supervised downstream tasks has become commonplace in natural language processing.
We experiment with four datasets from the GLUE benchmark, fine-tuning BERT hundreds of times on each while varying only the random seeds.
We find substantial performance increases compared to previously reported results, and we quantify how the performance of the best-found model varies as a function of the number of fine-tuning trials.
arXiv Detail & Related papers (2020-02-15T02:40:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.