Task-specific Objectives of Pre-trained Language Models for Dialogue
Adaptation
- URL: http://arxiv.org/abs/2009.04984v1
- Date: Thu, 10 Sep 2020 16:46:46 GMT
- Title: Task-specific Objectives of Pre-trained Language Models for Dialogue
Adaptation
- Authors: Junlong Li, Zhuosheng Zhang, Hai Zhao, Xi Zhou, Xiang Zhou
- Abstract summary: Common process of utilizing PrLMs is first pre-training on large-scale general corpora with task-independent LM training objectives, then fine-tuning on task datasets with task-specific training objectives.
We introduce task-specific pre-training on in-domain task-related corpora with task-specific objectives.
This procedure is placed between the original two stages to enhance the model understanding capacity of specific tasks.
- Score: 79.0866650271659
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Pre-trained Language Models (PrLMs) have been widely used as backbones in
lots of Natural Language Processing (NLP) tasks. The common process of
utilizing PrLMs is first pre-training on large-scale general corpora with
task-independent LM training objectives, then fine-tuning on task datasets with
task-specific training objectives. Pre-training in a task-independent way
enables the models to learn language representations, which is universal to
some extent, but fails to capture crucial task-specific features in the
meantime. This will lead to an incompatibility between pre-training and
fine-tuning. To address this issue, we introduce task-specific pre-training on
in-domain task-related corpora with task-specific objectives. This procedure is
placed between the original two stages to enhance the model understanding
capacity of specific tasks. In this work, we focus on Dialogue-related Natural
Language Processing (DrNLP) tasks and design a Dialogue-Adaptive Pre-training
Objective (DAPO) based on some important qualities for assessing dialogues
which are usually ignored by general LM pre-training objectives. PrLMs with
DAPO on a large in-domain dialogue corpus are then fine-tuned for downstream
DrNLP tasks. Experimental results show that models with DAPO surpass those with
general LM pre-training objectives and other strong baselines on downstream
DrNLP tasks.
Related papers
- TapWeight: Reweighting Pretraining Objectives for Task-Adaptive Pretraining [34.93043212352875]
TapWeight is a task-adaptive pretraining framework which automatically determines the optimal importance of each pretraining objective.
We applied TapWeight to both molecular property prediction and natural language understanding tasks, significantly surpassing baseline methods.
arXiv Detail & Related papers (2024-10-13T20:56:13Z) - Forging Multiple Training Objectives for Pre-trained Language Models via
Meta-Learning [97.28779163988833]
Multiple pre-training objectives fill the vacancy of the understanding capability of single-objective language modeling.
We propose textitMOMETAS, a novel adaptive sampler based on meta-learning, which learns the latent sampling pattern on arbitrary pre-training objectives.
arXiv Detail & Related papers (2022-10-19T04:38:26Z) - Bridging the Gap between Language Models and Cross-Lingual Sequence
Labeling [101.74165219364264]
Large-scale cross-lingual pre-trained language models (xPLMs) have shown effectiveness in cross-lingual sequence labeling tasks.
Despite the great success, we draw an empirical observation that there is a training objective gap between pre-training and fine-tuning stages.
In this paper, we first design a pre-training task tailored for xSL named Cross-lingual Language Informative Span Masking (CLISM) to eliminate the objective gap.
Second, we present ContrAstive-Consistency Regularization (CACR), which utilizes contrastive learning to encourage the consistency between representations of input parallel
arXiv Detail & Related papers (2022-04-11T15:55:20Z) - AdaPrompt: Adaptive Model Training for Prompt-based NLP [77.12071707955889]
We propose AdaPrompt, adaptively retrieving external data for continual pretraining of PLMs.
Experimental results on five NLP benchmarks show that AdaPrompt can improve over standard PLMs in few-shot settings.
In zero-shot settings, our method outperforms standard prompt-based methods by up to 26.35% relative error reduction.
arXiv Detail & Related papers (2022-02-10T04:04:57Z) - Domain-Adaptive Pretraining Methods for Dialogue Understanding [42.83187765297047]
Language models like BERT and SpanBERT pretrained on open-domain data have obtained impressive gains on various NLP tasks.
In this paper, we probe the effectiveness of domain-adaptive pretraining objectives on downstream tasks.
arXiv Detail & Related papers (2021-05-28T08:25:27Z) - Structural Pre-training for Dialogue Comprehension [51.215629336320305]
We present SPIDER, Structural Pre-traIned DialoguE Reader, to capture dialogue exclusive features.
To simulate the dialogue-like features, we propose two training objectives in addition to the original LM objectives.
Experimental results on widely used dialogue benchmarks verify the effectiveness of the newly introduced self-supervised tasks.
arXiv Detail & Related papers (2021-05-23T15:16:54Z) - Self-Supervised Meta-Learning for Few-Shot Natural Language
Classification Tasks [40.97125791174191]
We propose a self-supervised approach to generate a large, rich, meta-learning task distribution from unlabeled text.
We show that this meta-training leads to better few-shot generalization than language-model pre-training followed by finetuning.
arXiv Detail & Related papers (2020-09-17T17:53:59Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.