Related papers: Task-guided Disentangled Tuning for Pretrained Language Models

Task-guided Disentangled Tuning for Pretrained Language Models

URL: http://arxiv.org/abs/2203.11431v1
Date: Tue, 22 Mar 2022 03:11:39 GMT
Title: Task-guided Disentangled Tuning for Pretrained Language Models
Authors: Jiali Zeng, Yufan Jiang, Shuangzhi Wu, Yongjing Yin, Mu Li
Abstract summary: We propose Task-guided Disentangled Tuning (TDT) for pretrained language models (PLMs) TDT enhances the generalization of representations by disentangling task-relevant signals from entangled representations. Experimental results on GLUE and CLUE benchmarks show that TDT gives consistently better results than fine-tuning with different PLMs.
Score: 16.429787408467703
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Pretrained language models (PLMs) trained on large-scale unlabeled corpus are typically fine-tuned on task-specific downstream datasets, which have produced state-of-the-art results on various NLP tasks. However, the data discrepancy issue in domain and scale makes fine-tuning fail to efficiently capture task-specific patterns, especially in the low data regime. To address this issue, we propose Task-guided Disentangled Tuning (TDT) for PLMs, which enhances the generalization of representations by disentangling task-relevant signals from the entangled representations. For a given task, we introduce a learnable confidence model to detect indicative guidance from context, and further propose a disentangled regularization to mitigate the over-reliance problem. Experimental results on GLUE and CLUE benchmarks show that TDT gives consistently better results than fine-tuning with different PLMs, and extensive analysis demonstrates the effectiveness and robustness of our method. Code is available at https://github.com/lemon0830/TDT.

Related papers

ROSE: A Reward-Oriented Data Selection Framework for LLM Task-Specific Instruction Tuning [29.001249598245]
We introduce Reward-Oriented inStruction data sElection to optimize data selection for task-specific instruction tuning. ROSE adapts an influence formulation to approximate the influence of training data points relative to a few-shot preference validation set to select the most task-related training data points.
arXiv Detail & Related papers (2024-12-01T01:01:09Z)
SELF-GUIDE: Better Task-Specific Instruction Following via Self-Synthetic Finetuning [70.21358720599821]
Large language models (LLMs) hold the promise of solving diverse tasks when provided with appropriate natural language prompts. We propose SELF-GUIDE, a multi-stage mechanism in which we synthesize task-specific input-output pairs from the student LLM. We report an absolute improvement of approximately 15% for classification tasks and 18% for generation tasks in the benchmark's metrics.
arXiv Detail & Related papers (2024-07-16T04:41:58Z)
TAIA: Large Language Models are Out-of-Distribution Data Learners [30.57872423927015]
We propose an effective inference-time intervention method: Training All parameters but Inferring with only Attention (trainallInfAttn) trainallInfAttn achieves superior improvements compared to both the fully fine-tuned model and the base model in most scenarios. The high tolerance of trainallInfAttn to data mismatches makes it resistant to jailbreaking tuning and enhances specialized tasks using general data.
arXiv Detail & Related papers (2024-05-30T15:57:19Z)
Self-Distillation Bridges Distribution Gap in Language Model Fine-Tuning [29.72531930169291]
Self-Distillation Fine-Tuning (SDFT) is a novel approach that bridges the distribution gap by guiding fine-tuning with a distilled dataset generated by the model itself to match its original distribution. Experimental results on the Llama-2-chat model across various benchmarks demonstrate that SDFT effectively mitigates catastrophic forgetting.
arXiv Detail & Related papers (2024-02-21T10:06:08Z)
Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data. For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z)
Instruction Position Matters in Sequence Generation with Large Language Models [67.87516654892343]
Large language models (LLMs) are capable of performing conditional sequence generation tasks, such as translation or summarization. We propose enhancing the instruction-following capability of LLMs by shifting the position of task instructions after the input sentences.
arXiv Detail & Related papers (2023-08-23T12:36:57Z)
Improving Pre-trained Language Model Fine-tuning with Noise Stability Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR) Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model. We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z)
Prompt Tuning for Discriminative Pre-trained Language Models [96.04765512463415]
Recent works have shown promising results of prompt tuning in stimulating pre-trained language models (PLMs) for natural language processing (NLP) tasks. It is still unknown whether and how discriminative PLMs, e.g., ELECTRA, can be effectively prompt-tuned. We present DPT, the first prompt tuning framework for discriminative PLMs, which reformulates NLP tasks into a discriminative language modeling problem.
arXiv Detail & Related papers (2022-05-23T10:11:50Z)
CSS-LM: A Contrastive Framework for Semi-supervised Fine-tuning of Pre-trained Language Models [59.49705076369856]
We introduce a novel framework to improve the fine-tuning phase of pre-trained language models (PLMs) We retrieve positive and negative instances from large-scale unlabeled corpora according to their domain-level and class-level semantic relatedness to a task. We then perform contrastive semi-supervised learning on both the retrieved unlabeled and original labeled instances to help PLMs capture crucial task-related semantic features.
arXiv Detail & Related papers (2021-02-07T09:27:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.