Related papers: Rethinking Why Intermediate-Task Fine-Tuning Works

Rethinking Why Intermediate-Task Fine-Tuning Works

URL: http://arxiv.org/abs/2108.11696v1
Date: Thu, 26 Aug 2021 10:34:37 GMT
Title: Rethinking Why Intermediate-Task Fine-Tuning Works
Authors: Ting-Yun Chang and Chi-Jen Lu
Abstract summary: STILTs is able to further improve the performance of pretrained language models. Previous research shows that those intermediate tasks involving complex inference, such as commonsense reasoning, work especially well for RoBERTa.
Score: 4.294650528226682
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Supplementary Training on Intermediate Labeled-data Tasks (STILTs) is a widely applied technique, which first fine-tunes the pretrained language models on an intermediate task before on the target task of interest. While STILTs is able to further improve the performance of pretrained language models, it is still unclear why and when it works. Previous research shows that those intermediate tasks involving complex inference, such as commonsense reasoning, work especially well for RoBERTa. In this paper, we discover that the improvement from an intermediate task could be orthogonal to it containing reasoning or other complex skills -- a simple real-fake discrimination task synthesized by GPT2 can benefit diverse target tasks. We conduct extensive experiments to study the impact of different factors on STILTs. These findings suggest rethinking the role of intermediate fine-tuning in the STILTs pipeline.

Related papers

Does the Order of Fine-tuning Matter and Why? [11.975836356680855]
We study the effect of fine-tuning multiple intermediate tasks and their ordering on target task performance. Experimental results show that there is an impact of task ordering on target task performance by up to 6% of performance gain and up to 4% of performance loss.
arXiv Detail & Related papers (2024-10-03T19:07:14Z)
Mitigating Interference in the Knowledge Continuum through Attention-Guided Incremental Learning [17.236861687708096]
Attention-Guided Incremental Learning' (AGILE) is a rehearsal-based CL approach that incorporates compact task attention to effectively reduce interference between tasks. AGILE significantly improves generalization performance by mitigating task interference and outperforming rehearsal-based approaches in several CL scenarios.
arXiv Detail & Related papers (2024-05-22T20:29:15Z)
Ladder-of-Thought: Using Knowledge as Steps to Elevate Stance Detection [73.31406286956535]
We introduce the Ladder-of-Thought (LoT) for the stance detection task. LoT directs the small LMs to assimilate high-quality external knowledge, refining the intermediate rationales produced. Our empirical evaluations underscore LoT's efficacy, marking a 16% improvement over GPT-3.5 and a 10% enhancement compared to GPT-3.5 with CoT on stance detection task.
arXiv Detail & Related papers (2023-08-31T14:31:48Z)
"It's a Match!" -- A Benchmark of Task Affinity Scores for Joint Learning [74.14961250042629]
Multi-Task Learning (MTL) promises attractive, characterizing the conditions of its success is still an open problem in Deep Learning. Estimateing task affinity for joint learning is a key endeavor. Recent work suggests that the training conditions themselves have a significant impact on the outcomes of MTL. Yet, the literature is lacking a benchmark to assess the effectiveness of tasks affinity estimation techniques.
arXiv Detail & Related papers (2023-01-07T15:16:35Z)
Task-Agnostic Continual Reinforcement Learning: Gaining Insights and Overcoming Challenges [27.474011433615317]
Continual learning (CL) enables the development of models and agents that learn from a sequence of tasks. We investigate the factors that contribute to the performance differences between task-agnostic CL and multi-task (MTL) agents.
arXiv Detail & Related papers (2022-05-28T17:59:00Z)
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities [76.97949110580703]
We introduce SUPERB-SG, a new benchmark to evaluate pre-trained models across various speech tasks. We use a lightweight methodology to test the robustness of representations learned by pre-trained models under shifts in data domain. We also show that the task diversity of SUPERB-SG coupled with limited task supervision is an effective recipe for evaluating the generalizability of model representation.
arXiv Detail & Related papers (2022-03-14T04:26:40Z)
Weighted Training for Cross-Task Learning [71.94908559469475]
We introduce Target-Aware Weighted Training (TAWT), a weighted training algorithm for cross-task learning. We show that TAWT is easy to implement, is computationally efficient, requires little hyper parameter tuning, and enjoys non-asymptotic learning-theoretic guarantees. As a byproduct, the proposed representation-based task distance allows one to reason in a theoretically principled way about several critical aspects of cross-task learning.
arXiv Detail & Related papers (2021-05-28T20:27:02Z)
Exploring and Predicting Transferability across NLP Tasks [115.6278033699853]
We study the transferability between 33 NLP tasks across three broad classes of problems. Our results show that transfer learning is more beneficial than previously thought. We also develop task embeddings that can be used to predict the most transferable source tasks for a given target task.
arXiv Detail & Related papers (2020-05-02T09:39:36Z)
Intermediate-Task Transfer Learning with Pretrained Models for Natural Language Understanding: When and Why Does It Work? [44.88358841370665]
It is poorly understood when and why intermediate-task training is beneficial for a given target task. We perform a large-scale study on the pretrained RoBERTa model with 110 intermediate-target task combinations. We observe that intermediate tasks requiring high-level inference and reasoning abilities tend to work best.
arXiv Detail & Related papers (2020-05-01T21:49:34Z)
Task-Feature Collaborative Learning with Application to Personalized Attribute Prediction [166.87111665908333]
We propose a novel multi-task learning method called Task-Feature Collaborative Learning (TFCL) Specifically, we first propose a base model with a heterogeneous block-diagonal structure regularizer to leverage the collaborative grouping of features and tasks. As a practical extension, we extend the base model by allowing overlapping features and differentiating the hard tasks.
arXiv Detail & Related papers (2020-04-29T02:32:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.