Neural Semi-supervised Learning for Text Classification Under
Large-Scale Pretraining
- URL: http://arxiv.org/abs/2011.08626v2
- Date: Thu, 19 Nov 2020 12:43:58 GMT
- Title: Neural Semi-supervised Learning for Text Classification Under
Large-Scale Pretraining
- Authors: Zijun Sun, Chun Fan, Xiaofei Sun, Yuxian Meng, Fei Wu and Jiwei Li
- Abstract summary: We conduct studies on semi-supervised learning in the task of text classification under the context of large-scale LM pretraining.
Our work marks an initial step in understanding the behavior of semi-supervised learning models under the context of large-scale pretraining.
- Score: 51.19885385587916
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The goal of semi-supervised learning is to utilize the unlabeled, in-domain
dataset U to improve models trained on the labeled dataset D. Under the context
of large-scale language-model (LM) pretraining, how we can make the best use of
U is poorly understood: is semi-supervised learning still beneficial with the
presence of large-scale pretraining? should U be used for in-domain LM
pretraining or pseudo-label generation? how should the pseudo-label based
semi-supervised model be actually implemented? how different semi-supervised
strategies affect performances regarding D of different sizes, U of different
sizes, etc. In this paper, we conduct comprehensive studies on semi-supervised
learning in the task of text classification under the context of large-scale LM
pretraining. Our studies shed important lights on the behavior of
semi-supervised learning methods: (1) with the presence of in-domain
pretraining LM on U, open-domain LM pretraining is unnecessary; (2) both the
in-domain pretraining strategy and the pseudo-label based strategy introduce
significant performance boosts, with the former performing better with larger
U, the latter performing better with smaller U, and the combination leading to
the largest performance boost; (3) self-training (pretraining first on pseudo
labels D' and then fine-tuning on D) yields better performances when D is
small, while joint training on the combination of pseudo labels D' and the
original dataset D yields better performances when D is large. Using
semi-supervised learning strategies, we are able to achieve a performance of
around 93.8% accuracy with only 50 training data points on the IMDB dataset,
and a competitive performance of 96.6% with the full IMDB dataset. Our work
marks an initial step in understanding the behavior of semi-supervised learning
models under the context of large-scale pretraining.
Related papers
- Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - A Large-scale Evaluation of Pretraining Paradigms for the Detection of
Defects in Electroluminescence Solar Cell Images [3.729242965449096]
This work is a large-scale evaluation and benchmarking of various pretraining methods for Solar Cell Defect Detection.
We cover supervised training with semantic segmentation, semi-supervised learning, and two self-supervised techniques.
We achieve a new state-of-the-art for SCDD and demonstrate that certain pretraining schemes result in superior performance on underrepresented classes.
arXiv Detail & Related papers (2024-02-27T15:37:15Z) - Understanding and Mitigating the Label Noise in Pre-training on
Downstream Tasks [91.15120211190519]
This paper aims to understand the nature of noise in pre-training datasets and to mitigate its impact on downstream tasks.
We propose a light-weight black-box tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise.
arXiv Detail & Related papers (2023-09-29T06:18:15Z) - Generation-driven Contrastive Self-training for Zero-shot Text Classification with Instruction-following LLM [31.25193238045053]
We introduce a novel method, namely GenCo, which leverages the strong generative power of large language models to assist in training a smaller language model.
In our method, an LLM plays an important role in the self-training loop of a smaller model in two important ways.
It helps crafting additional high-quality training pairs, by rewriting input texts conditioned on predicted labels.
arXiv Detail & Related papers (2023-04-24T07:35:38Z) - An Efficient Active Learning Pipeline for Legal Text Classification [2.462514989381979]
We propose a pipeline for effectively using active learning with pre-trained language models in the legal domain.
We use knowledge distillation to guide the model's embeddings to a semantically meaningful space.
Our experiments on Contract-NLI, adapted to the classification task, and LEDGAR benchmarks show that our approach outperforms standard AL strategies.
arXiv Detail & Related papers (2022-11-15T13:07:02Z) - A semi-supervised Teacher-Student framework for surgical tool detection
and localization [2.41710192205034]
We introduce a semi-supervised learning (SSL) framework in surgical tool detection paradigm.
In the proposed work, we train a model with labeled data which initialises the Teacher-Student joint learning.
Our results on m2cai16-tool-locations dataset indicate the superiority of our approach on different supervised data settings.
arXiv Detail & Related papers (2022-08-21T17:21:31Z) - Self-Supervised Pre-Training for Transformer-Based Person
Re-Identification [54.55281692768765]
Transformer-based supervised pre-training achieves great performance in person re-identification (ReID)
Due to the domain gap between ImageNet and ReID datasets, it usually needs a larger pre-training dataset to boost the performance.
This work aims to mitigate the gap between the pre-training and ReID datasets from the perspective of data and model structure.
arXiv Detail & Related papers (2021-11-23T18:59:08Z) - On the Transferability of Pre-trained Language Models: A Study from
Artificial Datasets [74.11825654535895]
Pre-training language models (LMs) on large-scale unlabeled text data makes the model much easier to achieve exceptional downstream performance.
We study what specific traits in the pre-training data, other than the semantics, make a pre-trained LM superior to their counterparts trained from scratch on downstream tasks.
arXiv Detail & Related papers (2021-09-08T10:39:57Z) - Don't Stop Pretraining: Adapt Language Models to Domains and Tasks [81.99843216550306]
We present a study across four domains (biomedical and computer science publications, news, and reviews) and eight classification tasks.
A second phase of pretraining in-domain (domain-adaptive pretraining) leads to performance gains.
Adapting to the task's unlabeled data (task-adaptive pretraining) improves performance even after domain-adaptive pretraining.
arXiv Detail & Related papers (2020-04-23T04:21:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.