Self-Evolution Learning for Discriminative Language Model Pretraining
- URL: http://arxiv.org/abs/2305.15275v1
- Date: Wed, 24 May 2023 16:00:54 GMT
- Title: Self-Evolution Learning for Discriminative Language Model Pretraining
- Authors: Qihuang Zhong, Liang Ding, Juhua Liu, Bo Du and Dacheng Tao
- Abstract summary: Self-Evolution learning (SE) is a simple and effective token masking and learning method.
SE focuses on learning the informative yet under-explored tokens and adaptively regularizes the training by introducing a novel Token-specific Label Smoothing approach.
- Score: 103.57103957631067
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Masked language modeling, widely used in discriminative language model (e.g.,
BERT) pretraining, commonly adopts a random masking strategy. However, random
masking does not consider the importance of the different words in the sentence
meaning, where some of them are more worthy to be predicted. Therefore, various
masking strategies (e.g., entity-level masking) are proposed, but most of them
require expensive prior knowledge and generally train from scratch without
reusing existing model weights. In this paper, we present Self-Evolution
learning (SE), a simple and effective token masking and learning method to
fully and wisely exploit the knowledge from data. SE focuses on learning the
informative yet under-explored tokens and adaptively regularizes the training
by introducing a novel Token-specific Label Smoothing approach. Experiments on
10 tasks show that our SE brings consistent and significant improvements
(+1.43~2.12 average scores) upon different PLMs. In-depth analyses demonstrate
that SE improves linguistic knowledge learning and generalization.
Related papers
- Enhancing Visual Continual Learning with Language-Guided Supervision [76.38481740848434]
Continual learning aims to empower models to learn new tasks without forgetting previously acquired knowledge.
We argue that the scarce semantic information conveyed by the one-hot labels hampers the effective knowledge transfer across tasks.
Specifically, we use PLMs to generate semantic targets for each class, which are frozen and serve as supervision signals.
arXiv Detail & Related papers (2024-03-24T12:41:58Z) - Language Model Adaptation to Specialized Domains through Selective
Masking based on Genre and Topical Characteristics [4.9639158834745745]
We introduce an innovative masking approach leveraging genre and topicality information to tailor language models to specialized domains.
Our method incorporates a ranking process that prioritizes words based on their significance, subsequently guiding the masking procedure.
Experiments conducted using continual pre-training within the legal domain have underscored the efficacy of our approach on the LegalGLUE benchmark in the English language.
arXiv Detail & Related papers (2024-02-19T10:43:27Z) - Unsupervised Improvement of Factual Knowledge in Language Models [4.5788796239850225]
Masked language modeling plays a key role in pretraining large language models.
We propose an approach for influencing pretraining in a way that can improve language model performance on a variety of knowledge-intensive tasks.
arXiv Detail & Related papers (2023-04-04T07:37:06Z) - Leveraging per Image-Token Consistency for Vision-Language Pre-training [52.825150269820696]
Cross-modal masked language modeling (CMLM) is insufficient for vision-language pre-training.
We propose EPIC (lEveraging Per Image-Token Consistency for vision-language pre-training)
The proposed EPIC method is easily combined with pre-training methods.
arXiv Detail & Related papers (2022-11-20T12:10:53Z) - Improving Temporal Generalization of Pre-trained Language Models with
Lexical Semantic Change [28.106524698188675]
Recent research has revealed that neural language models at scale suffer from poor temporal generalization capability.
We propose a simple yet effective lexical-level masking strategy to post-train a converged language model.
arXiv Detail & Related papers (2022-10-31T08:12:41Z) - Retrieval Oriented Masking Pre-training Language Model for Dense Passage
Retrieval [16.592276887533714]
Masked Language Modeling (MLM) is a major sub-task of the pre-training process.
Traditional random masking strategy tend to select a large number of tokens that have limited effect on the passage retrieval task.
We propose alternative retrieval oriented masking (dubbed as ROM) strategy where more important tokens will have a higher probability of being masked out.
arXiv Detail & Related papers (2022-10-27T02:43:48Z) - Probing Simile Knowledge from Pre-trained Language Models [16.411859515803098]
Simile interpretation (SI) and simile generation (SG) are challenging tasks for NLP because models require adequate world knowledge to produce predictions.
In recent years, pre-trained language models (PLMs) based approaches have become the de-facto standard in NLP.
In this paper, we probe simile knowledge from PLMs to solve the SI and SG tasks in the unified framework of simile triple completion for the first time.
arXiv Detail & Related papers (2022-04-27T09:55:40Z) - Bridging the Gap between Language Models and Cross-Lingual Sequence
Labeling [101.74165219364264]
Large-scale cross-lingual pre-trained language models (xPLMs) have shown effectiveness in cross-lingual sequence labeling tasks.
Despite the great success, we draw an empirical observation that there is a training objective gap between pre-training and fine-tuning stages.
In this paper, we first design a pre-training task tailored for xSL named Cross-lingual Language Informative Span Masking (CLISM) to eliminate the objective gap.
Second, we present ContrAstive-Consistency Regularization (CACR), which utilizes contrastive learning to encourage the consistency between representations of input parallel
arXiv Detail & Related papers (2022-04-11T15:55:20Z) - Revisiting Self-Training for Few-Shot Learning of Language Model [61.173976954360334]
Unlabeled data carry rich task-relevant information, they are proven useful for few-shot learning of language model.
In this work, we revisit the self-training technique for language model fine-tuning and present a state-of-the-art prompt-based few-shot learner, SFLM.
arXiv Detail & Related papers (2021-10-04T08:51:36Z) - Neural Mask Generator: Learning to Generate Adaptive Word Maskings for
Language Model Adaptation [63.195935452646815]
We propose a method to automatically generate a domain- and task-adaptive maskings of the given text for self-supervised pre-training.
We present a novel reinforcement learning-based framework which learns the masking policy.
We validate our Neural Mask Generator (NMG) on several question answering and text classification datasets.
arXiv Detail & Related papers (2020-10-06T13:27:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.