Different Strokes for Different Folks: Investigating Appropriate Further
Pre-training Approaches for Diverse Dialogue Tasks
- URL: http://arxiv.org/abs/2109.06524v1
- Date: Tue, 14 Sep 2021 08:42:50 GMT
- Title: Different Strokes for Different Folks: Investigating Appropriate Further
Pre-training Approaches for Diverse Dialogue Tasks
- Authors: Yao Qiu, Jinchao Zhang, Jie Zhou
- Abstract summary: We show that different downstream tasks prefer different further pre-training tasks, which have intrinsic correlation.
Our investigation indicates that it is of great importance and effectiveness to design appropriate further pre-training tasks.
- Score: 18.375585982984845
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Loading models pre-trained on the large-scale corpus in the general domain
and fine-tuning them on specific downstream tasks is gradually becoming a
paradigm in Natural Language Processing. Previous investigations prove that
introducing a further pre-training phase between pre-training and fine-tuning
phases to adapt the model on the domain-specific unlabeled data can bring
positive effects. However, most of these further pre-training works just keep
running the conventional pre-training task, e.g., masked language model, which
can be regarded as the domain adaptation to bridge the data distribution gap.
After observing diverse downstream tasks, we suggest that different tasks may
also need a further pre-training phase with appropriate training tasks to
bridge the task formulation gap. To investigate this, we carry out a study for
improving multiple task-oriented dialogue downstream tasks through designing
various tasks at the further pre-training phase. The experiment shows that
different downstream tasks prefer different further pre-training tasks, which
have intrinsic correlation and most further pre-training tasks significantly
improve certain target tasks rather than all. Our investigation indicates that
it is of great importance and effectiveness to design appropriate further
pre-training tasks modeling specific information that benefit downstream tasks.
Besides, we present multiple constructive empirical conclusions for enhancing
task-oriented dialogues.
Related papers
- ULTRA-DP: Unifying Graph Pre-training with Multi-task Graph Dual Prompt [67.8934749027315]
We propose a unified framework for graph hybrid pre-training which injects the task identification and position identification into GNNs.
We also propose a novel pre-training paradigm based on a group of $k$-nearest neighbors.
arXiv Detail & Related papers (2023-10-23T12:11:13Z) - Contrastive Multi-Task Dense Prediction [11.227696986100447]
A core objective in design is how to effectively model cross-task interactions to achieve a comprehensive improvement on different tasks.
We introduce feature-wise contrastive consistency into modeling the cross-task interactions for multi-task dense prediction.
We propose a novel multi-task contrastive regularization method based on the consistency to effectively boost the representation learning of the different sub-tasks.
arXiv Detail & Related papers (2023-07-16T03:54:01Z) - Understanding the Transferability of Representations via Task-Relatedness [8.425690424016986]
We propose a novel analysis that analyzes the transferability of the representations of pre-trained models to downstream tasks in terms of their relatedness to a given reference task.
Our experiments using state-of-the-art pre-trained models show the effectiveness of task-relatedness in explaining transferability on various vision and language tasks.
arXiv Detail & Related papers (2023-07-03T08:06:22Z) - Task Compass: Scaling Multi-task Pre-training with Task Prefix [122.49242976184617]
Existing studies show that multi-task learning with large-scale supervised tasks suffers from negative effects across tasks.
We propose a task prefix guided multi-task pre-training framework to explore the relationships among tasks.
Our model can not only serve as the strong foundation backbone for a wide range of tasks but also be feasible as a probing tool for analyzing task relationships.
arXiv Detail & Related papers (2022-10-12T15:02:04Z) - Effective Adaptation in Multi-Task Co-Training for Unified Autonomous
Driving [103.745551954983]
In this paper, we investigate the transfer performance of various types of self-supervised methods, including MoCo and SimCLR, on three downstream tasks.
We find that their performances are sub-optimal or even lag far behind the single-task baseline.
We propose a simple yet effective pretrain-adapt-finetune paradigm for general multi-task training.
arXiv Detail & Related papers (2022-09-19T12:15:31Z) - Should We Be Pre-training? An Argument for End-task Aware Training as an
Alternative [88.11465517304515]
In general, the pre-training step relies on little to no direct knowledge of the task on which the model will be fine-tuned.
We show that multi-tasking the end-task and auxiliary objectives results in significantly better downstream task performance.
arXiv Detail & Related papers (2021-09-15T17:13:18Z) - Task-specific Objectives of Pre-trained Language Models for Dialogue
Adaptation [79.0866650271659]
Common process of utilizing PrLMs is first pre-training on large-scale general corpora with task-independent LM training objectives, then fine-tuning on task datasets with task-specific training objectives.
We introduce task-specific pre-training on in-domain task-related corpora with task-specific objectives.
This procedure is placed between the original two stages to enhance the model understanding capacity of specific tasks.
arXiv Detail & Related papers (2020-09-10T16:46:46Z) - Intermediate-Task Transfer Learning with Pretrained Models for Natural
Language Understanding: When and Why Does It Work? [44.88358841370665]
It is poorly understood when and why intermediate-task training is beneficial for a given target task.
We perform a large-scale study on the pretrained RoBERTa model with 110 intermediate-target task combinations.
We observe that intermediate tasks requiring high-level inference and reasoning abilities tend to work best.
arXiv Detail & Related papers (2020-05-01T21:49:34Z) - Pre-training Text Representations as Meta Learning [113.3361289756749]
We introduce a learning algorithm which directly optimize model's ability to learn text representations for effective learning of downstream tasks.
We show that there is an intrinsic connection between multi-task pre-training and model-agnostic meta-learning with a sequence of meta-train steps.
arXiv Detail & Related papers (2020-04-12T09:05:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.