Related papers: Does the Order of Fine-tuning Matter and Why?

Does the Order of Fine-tuning Matter and Why?

URL: http://arxiv.org/abs/2410.02915v1
Date: Thu, 3 Oct 2024 19:07:14 GMT
Title: Does the Order of Fine-tuning Matter and Why?
Authors: Qihong Chen, Jiawei Li, Hyunjae Suh, Lianghao Jiang, Zheng Zhou, Jingze Chen, Jiri Gesi, Iftekhar Ahmed,
Abstract summary: We study the effect of fine-tuning multiple intermediate tasks and their ordering on target task performance. Experimental results show that there is an impact of task ordering on target task performance by up to 6% of performance gain and up to 4% of performance loss.
Score: 11.975836356680855
License: http://creativecommons.org/licenses/by/4.0/
Abstract: To improve the performance on a target task, researchers have fine-tuned language models with an intermediate task before the target task of interest. However, previous works have focused on the pre-trained language models and downstream tasks in Natural Language Processing (NLP) and considered only one intermediate task. The effect of fine-tuning multiple intermediate tasks and their ordering on target task performance has not been fully explored in Software Engineering. In this study, we perform the first empirical study on analyzing the impact of task ordering on target task performance. Experimental results show that there is an impact of task ordering on target task performance by up to 6% of performance gain and up to 4% of performance loss. To explain such an impact, we consider a variety of potential factors, including the characteristics of dataset (syntactic similarity and semantic similarity analysis, dataset size), model (probing task and attention analysis), and task (task affinity analysis). Our study provides Software Engineering researchers and practitioners with insights into the effect of task orderings and how to select the one that is cost-effective while achieving the best performance gain.

Related papers

Instruction Matters: A Simple yet Effective Task Selection for Optimized Instruction Tuning of Specific Tasks [51.15473776489712]
We introduce a simple yet effective task selection method that leverages instruction information alone to identify relevant tasks. Our method is significantly more efficient than traditional approaches, which require complex measurements of pairwise transferability between tasks or the creation of data samples for the target task. Experimental results demonstrate that training on a small set of tasks, chosen solely on the instructions, results in substantial improvements in performance on benchmarks such as P3, Big-Bench, NIV2, and Big-Bench Hard.
arXiv Detail & Related papers (2024-04-25T08:49:47Z)
Divergence-Based Domain Transferability for Zero-Shot Classification [78.55044112903148]
Transferring learned patterns from pretrained neural language models has been shown to significantly improve effectiveness across a variety of language-based tasks. Further tuning on intermediate tasks has been demonstrated to provide additional performance benefits, provided the intermediate task is sufficiently related to the target task. However, how to identify related tasks is an open problem, and brute-force searching effective task combinations is prohibitively expensive.
arXiv Detail & Related papers (2023-02-11T16:04:38Z)
Composite Learning for Robust and Effective Dense Predictions [81.2055761433725]
Multi-task learning promises better model generalization on a target task by jointly optimizing it with an auxiliary task. We find that jointly training a dense prediction (target) task with a self-supervised (auxiliary) task can consistently improve the performance of the target task, while eliminating the need for labeling auxiliary tasks.
arXiv Detail & Related papers (2022-10-13T17:59:16Z)
Task Formulation Matters When Learning Continually: A Case Study in Visual Question Answering [58.82325933356066]
Continual learning aims to train a model incrementally on a sequence of tasks without forgetting previous knowledge. We present a detailed study of how different settings affect performance for Visual Question Answering.
arXiv Detail & Related papers (2022-09-30T19:12:58Z)
The Effect of Task Ordering in Continual Learning [12.571389210876315]
We show that reordering tasks significantly affects the amount of catastrophic forgetting. We show that the effect of task ordering can be exploited to modify continual learning performance.
arXiv Detail & Related papers (2022-05-26T12:56:15Z)
Weighted Training for Cross-Task Learning [71.94908559469475]
We introduce Target-Aware Weighted Training (TAWT), a weighted training algorithm for cross-task learning. We show that TAWT is easy to implement, is computationally efficient, requires little hyper parameter tuning, and enjoys non-asymptotic learning-theoretic guarantees. As a byproduct, the proposed representation-based task distance allows one to reason in a theoretically principled way about several critical aspects of cross-task learning.
arXiv Detail & Related papers (2021-05-28T20:27:02Z)
AiR: Attention with Reasoning Capability [31.3104693230952]
We propose an Attention with Reasoning capability (AiR) framework that uses attention to understand and improve the process leading to task outcomes. We first define an evaluation metric based on a sequence of atomic reasoning operations, enabling quantitative measurement of attention that considers the reasoning process. We then collect human eye-tracking and answer correctness data, and analyze various machine and human attentions on their reasoning capability and how they impact task performance.
arXiv Detail & Related papers (2020-07-28T18:09:45Z)
Exploring and Predicting Transferability across NLP Tasks [115.6278033699853]
We study the transferability between 33 NLP tasks across three broad classes of problems. Our results show that transfer learning is more beneficial than previously thought. We also develop task embeddings that can be used to predict the most transferable source tasks for a given target task.
arXiv Detail & Related papers (2020-05-02T09:39:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.