TapWeight: Reweighting Pretraining Objectives for Task-Adaptive Pretraining
- URL: http://arxiv.org/abs/2410.10006v1
- Date: Sun, 13 Oct 2024 20:56:13 GMT
- Title: TapWeight: Reweighting Pretraining Objectives for Task-Adaptive Pretraining
- Authors: Ruiyi Zhang, Sai Ashish Somayajula, Pengtao Xie,
- Abstract summary: TapWeight is a task-adaptive pretraining framework which automatically determines the optimal importance of each pretraining objective.
We applied TapWeight to both molecular property prediction and natural language understanding tasks, significantly surpassing baseline methods.
- Score: 34.93043212352875
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large-scale general domain pretraining followed by downstream-specific finetuning has become a predominant paradigm in machine learning. However, discrepancies between the pretraining and target domains can still lead to performance degradation in certain cases, underscoring the need for task-adaptive continued pretraining (TAP). TAP methods typically involve continued pretraining on task-specific unlabeled datasets or introducing additional unsupervised learning objectives to enhance model capabilities. While many TAP methods perform continued pretraining with multiple pretraining objectives, they often determine the tradeoff parameters between objectives manually, resulting in suboptimal outcomes and higher computational costs. In this paper, we propose TapWeight, a task-adaptive pretraining framework which automatically determines the optimal importance of each pretraining objective based on downstream feedback. TapWeight reweights each pretraining objective by solving a multi-level optimization problem. We applied TapWeight to both molecular property prediction and natural language understanding tasks, significantly surpassing baseline methods. Experimental results validate the effectiveness and generalizability of TapWeight.
Related papers
- Task-Oriented Pre-Training for Drivable Area Detection [5.57325257338134]
We propose a task-oriented pre-training method that begins with generating redundant segmentation proposals.
We then introduce a Specific Category Enhancement Fine-tuning (SCEF) strategy for fine-tuning the Contrastive Language-Image Pre-training (CLIP) model.
This approach can generate a lot of coarse training data for pre-training models, which are further fine-tuned using manually annotated data.
arXiv Detail & Related papers (2024-09-30T10:25:47Z) - Denoising Pre-Training and Customized Prompt Learning for Efficient Multi-Behavior Sequential Recommendation [69.60321475454843]
We propose DPCPL, the first pre-training and prompt-tuning paradigm tailored for Multi-Behavior Sequential Recommendation.
In the pre-training stage, we propose a novel Efficient Behavior Miner (EBM) to filter out the noise at multiple time scales.
Subsequently, we propose to tune the pre-trained model in a highly efficient manner with the proposed Customized Prompt Learning (CPL) module.
arXiv Detail & Related papers (2024-08-21T06:48:38Z) - Generalizable and Stable Finetuning of Pretrained Language Models on Low-Resource Texts [23.94064492903792]
We propose a regularization method based on attention-guided weight mixup for finetuning PLMs.
Our approach represents each network weight as a mixup of task-specific weight and pretrained weight, controlled by a learnable attention parameter.
We employ a bi-level optimization framework on two separate splits of the training dataset, improving generalization and combating overfitting.
arXiv Detail & Related papers (2024-03-19T17:21:29Z) - Effective Adaptation in Multi-Task Co-Training for Unified Autonomous
Driving [103.745551954983]
In this paper, we investigate the transfer performance of various types of self-supervised methods, including MoCo and SimCLR, on three downstream tasks.
We find that their performances are sub-optimal or even lag far behind the single-task baseline.
We propose a simple yet effective pretrain-adapt-finetune paradigm for general multi-task training.
arXiv Detail & Related papers (2022-09-19T12:15:31Z) - Improved Fine-tuning by Leveraging Pre-training Data: Theory and
Practice [52.11183787786718]
Fine-tuning a pre-trained model on the target data is widely used in many deep learning applications.
Recent studies have empirically shown that training from scratch has the final performance that is no worse than this pre-training strategy.
We propose a novel selection strategy to select a subset from pre-training data to help improve the generalization on the target task.
arXiv Detail & Related papers (2021-11-24T06:18:32Z) - Should We Be Pre-training? An Argument for End-task Aware Training as an
Alternative [88.11465517304515]
In general, the pre-training step relies on little to no direct knowledge of the task on which the model will be fine-tuned.
We show that multi-tasking the end-task and auxiliary objectives results in significantly better downstream task performance.
arXiv Detail & Related papers (2021-09-15T17:13:18Z) - Back-Translated Task Adaptive Pretraining: Improving Accuracy and
Robustness on Text Classification [5.420446976940825]
We propose a back-translated task-adaptive pretraining (BT-TAPT) method that increases the amount of task-specific data for LM re-pretraining.
The experimental results show that the proposed BT-TAPT yields improved classification accuracy on both low- and high-resource data and better robustness to noise than the conventional adaptive pretraining method.
arXiv Detail & Related papers (2021-07-22T06:27:35Z) - Task-specific Objectives of Pre-trained Language Models for Dialogue
Adaptation [79.0866650271659]
Common process of utilizing PrLMs is first pre-training on large-scale general corpora with task-independent LM training objectives, then fine-tuning on task datasets with task-specific training objectives.
We introduce task-specific pre-training on in-domain task-related corpora with task-specific objectives.
This procedure is placed between the original two stages to enhance the model understanding capacity of specific tasks.
arXiv Detail & Related papers (2020-09-10T16:46:46Z) - Don't Stop Pretraining: Adapt Language Models to Domains and Tasks [81.99843216550306]
We present a study across four domains (biomedical and computer science publications, news, and reviews) and eight classification tasks.
A second phase of pretraining in-domain (domain-adaptive pretraining) leads to performance gains.
Adapting to the task's unlabeled data (task-adaptive pretraining) improves performance even after domain-adaptive pretraining.
arXiv Detail & Related papers (2020-04-23T04:21:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.