Forging Multiple Training Objectives for Pre-trained Language Models via
Meta-Learning
- URL: http://arxiv.org/abs/2210.10293v1
- Date: Wed, 19 Oct 2022 04:38:26 GMT
- Title: Forging Multiple Training Objectives for Pre-trained Language Models via
Meta-Learning
- Authors: Hongqiu Wu, Ruixue Ding, Hai Zhao, Boli Chen, Pengjun Xie, Fei Huang,
Min Zhang
- Abstract summary: Multiple pre-training objectives fill the vacancy of the understanding capability of single-objective language modeling.
We propose textitMOMETAS, a novel adaptive sampler based on meta-learning, which learns the latent sampling pattern on arbitrary pre-training objectives.
- Score: 97.28779163988833
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multiple pre-training objectives fill the vacancy of the understanding
capability of single-objective language modeling, which serves the ultimate
purpose of pre-trained language models (PrLMs), generalizing well on a mass of
scenarios. However, learning multiple training objectives in a single model is
challenging due to the unknown relative significance as well as the potential
contrariety between them. Empirical studies have shown that the current
objective sampling in an ad-hoc manual setting makes the learned language
representation barely converge to the desired optimum. Thus, we propose
\textit{MOMETAS}, a novel adaptive sampler based on meta-learning, which learns
the latent sampling pattern on arbitrary pre-training objectives. Such a design
is lightweight with negligible additional training overhead. To validate our
approach, we adopt five objectives and conduct continual pre-training with
BERT-base and BERT-large models, where MOMETAS demonstrates universal
performance gain over other rule-based sampling strategies on 14 natural
language processing tasks.
Related papers
- Integrating Self-supervised Speech Model with Pseudo Word-level Targets
from Visually-grounded Speech Model [57.78191634042409]
We propose Pseudo-Word HuBERT (PW-HuBERT), a framework that integrates pseudo word-level targets into the training process.
Our experimental results on four spoken language understanding (SLU) benchmarks suggest the superiority of our model in capturing semantic information.
arXiv Detail & Related papers (2024-02-08T16:55:21Z) - Language Model Pre-Training with Sparse Latent Typing [66.75786739499604]
We propose a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types.
Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge.
arXiv Detail & Related papers (2022-10-23T00:37:08Z) - Few-shot Subgoal Planning with Language Models [58.11102061150875]
We show that language priors encoded in pre-trained language models allow us to infer fine-grained subgoal sequences.
In contrast to recent methods which make strong assumptions about subgoal supervision, our experiments show that language models can infer detailed subgoal sequences without any fine-tuning.
arXiv Detail & Related papers (2022-05-28T01:03:30Z) - Frustratingly Simple Pretraining Alternatives to Masked Language
Modeling [10.732163031244651]
Masked language modeling (MLM) is widely used in natural language processing for learning text representations.
In this paper, we explore five simple pretraining objectives based on token-level classification tasks as replacements of representations.
arXiv Detail & Related papers (2021-09-04T08:52:37Z) - Task-specific Objectives of Pre-trained Language Models for Dialogue
Adaptation [79.0866650271659]
Common process of utilizing PrLMs is first pre-training on large-scale general corpora with task-independent LM training objectives, then fine-tuning on task datasets with task-specific training objectives.
We introduce task-specific pre-training on in-domain task-related corpora with task-specific objectives.
This procedure is placed between the original two stages to enhance the model understanding capacity of specific tasks.
arXiv Detail & Related papers (2020-09-10T16:46:46Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.