Related papers: Transfer Learning for Finetuning Large Language Models

Transfer Learning for Finetuning Large Language Models

URL: http://arxiv.org/abs/2411.01195v1
Date: Sat, 02 Nov 2024 09:43:12 GMT
Title: Transfer Learning for Finetuning Large Language Models
Authors: Tobias Strangmann, Lennart Purucker, Jörg K. H. Franke, Ivo Rapant, Fabio Ferreira, Frank Hutter,
Abstract summary: We investigate transfer learning for finetuning large language models. We learn finetuning by meta-learning performance and cost surrogate models for grey-box meta-optimization from a new meta-dataset. Our results demonstrate the transferability of finetuning to adapt large language models more effectively.
Score: 36.047470973893155
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As the landscape of large language models expands, efficiently finetuning for specific tasks becomes increasingly crucial. At the same time, the landscape of parameter-efficient finetuning methods rapidly expands. Consequently, practitioners face a multitude of complex choices when searching for an optimal finetuning pipeline for large language models. To reduce the complexity for practitioners, we investigate transfer learning for finetuning large language models and aim to transfer knowledge about configurations from related finetuning tasks to a new task. In this work, we transfer learn finetuning by meta-learning performance and cost surrogate models for grey-box meta-optimization from a new meta-dataset. Counter-intuitively, we propose to rely only on transfer learning for new datasets. Thus, we do not use task-specific Bayesian optimization but prioritize knowledge transferred from related tasks over task-specific feedback. We evaluate our method on eight synthetic question-answer datasets and a meta-dataset consisting of 1,800 runs of finetuning Microsoft's Phi-3. Our transfer learning is superior to zero-shot, default finetuning, and meta-optimization baselines. Our results demonstrate the transferability of finetuning to adapt large language models more effectively.

Related papers

Transfer learning optimization based on evolutionary selective fine tuning [2.271776292902496]
Transfer learning offers a strategy for adapting pre-trained models to new tasks.<n>Traditional fine-tuning often involves updating all model parameters.<n>BioTune selectively fine-tunes layers to enhance transfer learning efficiency.
arXiv Detail & Related papers (2025-08-21T08:51:43Z)
The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs [54.59207567677249]
Large language models (LLMs) still struggle across tasks outside of high-resource languages.<n>In this work, we investigate cross-lingual transfer to lower-resource languages where task-specific post-training data is scarce.
arXiv Detail & Related papers (2025-05-23T20:28:31Z)
Less is More: Parameter-Efficient Selection of Intermediate Tasks for Transfer Learning [5.119396962985841]
Intermediate task transfer learning can greatly improve model performance. We conduct the largest study on NLP task transferability and task selection with 12k source-target pairs. Applying ESMs on a prior method reduces execution time and disk space usage by factors of 10 and 278, respectively.
arXiv Detail & Related papers (2024-10-19T16:22:04Z)
SwitchCIT: Switching for Continual Instruction Tuning [14.085371250265224]
Large language models (LLMs) and multimodal models (MMs) have exhibited impressive capabilities in various domains. Continual instruction tuning is crucial to adapt a large model to evolving tasks and domains. This work addresses the catastrophic forgetting in continual instruction learning through a mechanism for routing computations to parameter-efficient tuned models.
arXiv Detail & Related papers (2024-07-16T14:37:33Z)
Learning Semantic Proxies from Visual Prompts for Parameter-Efficient Fine-Tuning in Deep Metric Learning [13.964106147449051]
Existing solutions concentrate on fine-tuning the pre-trained models on conventional image datasets. We propose a novel and effective framework based on learning Visual Prompts (VPT) in the pre-trained Vision Transformers (ViT) We demonstrate that our new approximations with semantic information are superior to representative capabilities.
arXiv Detail & Related papers (2024-02-04T04:42:05Z)
Two-stage LLM Fine-tuning with Less Specialization and More Generalization [93.12197594813378]
We propose Prompt Tuning with MOdel Tuning (ProMoT) to reduce format specialization and improve generalization. ProMoT offloads task-specific format learning into additional and removable parameters by first doing prompt tuning and then fine-tuning the model itself with this soft prompt. ProMoT can even enhance generalization on in-context learning tasks that are semantically related to the fine-tuned task.
arXiv Detail & Related papers (2022-11-01T17:56:57Z)
Model-Agnostic Multitask Fine-tuning for Few-shot Vision-Language Transfer Learning [59.38343286807997]
We propose Model-Agnostic Multitask Fine-tuning (MAMF) for vision-language models on unseen tasks. Compared with model-agnostic meta-learning (MAML), MAMF discards the bi-level optimization and uses only first-order gradients. We show that MAMF consistently outperforms the classical fine-tuning method for few-shot transfer learning on five benchmark datasets.
arXiv Detail & Related papers (2022-03-09T17:26:53Z)
Towards a Unified View of Parameter-Efficient Transfer Learning [108.94786930869473]
Fine-tuning large pre-trained language models on downstream tasks has become the de-facto learning paradigm in NLP. Recent work has proposed a variety of parameter-efficient transfer learning methods that only fine-tune a small number of (extra) parameters to attain strong performance. We break down the design of state-of-the-art parameter-efficient transfer learning methods and present a unified framework that establishes connections between them.
arXiv Detail & Related papers (2021-10-08T20:22:26Z)
Selecting Informative Contexts Improves Language Model Finetuning [66.26521454263343]
We present a general fine-tuning method that we call information gain filtration. During fine-tuning, a secondary learner selects informative examples and skips uninformative ones. We show that our method has consistent improvement across datasets, fine-tuning tasks, and language model architectures.
arXiv Detail & Related papers (2020-05-01T02:01:18Z)
Investigating Transferability in Pretrained Language Models [8.83046338075119]
We consider a simple ablation technique for determining the impact of each pretrained layer on transfer task performance. This technique reveals that in BERT, layers with high probing performance on downstream GLUE tasks are neither necessary nor sufficient for high accuracy on those tasks.
arXiv Detail & Related papers (2020-04-30T17:23:19Z)
Parameter-Efficient Transfer from Sequential Behaviors for User Modeling and Recommendation [111.44445634272235]
In this paper, we develop a parameter efficient transfer learning architecture, termed as PeterRec. PeterRec allows the pre-trained parameters to remain unaltered during fine-tuning by injecting a series of re-learned neural networks. We perform extensive experimental ablation to show the effectiveness of the learned user representation in five downstream tasks.
arXiv Detail & Related papers (2020-01-13T14:09:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.