Derivative Free Weight-space Ensembling
- URL: http://arxiv.org/abs/2307.03506v2
- Date: Wed, 26 Jul 2023 09:06:22 GMT
- Title: Derivative Free Weight-space Ensembling
- Authors: Dean Ninalga
- Abstract summary: We introduce Derivative Free Weight-space Ensembling (DFWE), a new few-sample task transfer approach for open-domain dialogue.
We finetune each of the expert models on the target task, approaching the target task from several distinct knowledge bases.
We linearly interpolate between the model weights using a gradient-free-optimization algorithm, to efficiently find a good weighting.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent work suggests that interpolating between the weights of two
specialized language models can transfer knowledge between tasks in a way that
multi-task learning cannot. However, very few have explored interpolation
between more than two models, where each has a distinct knowledge base. In this
paper, we introduce Derivative Free Weight-space Ensembling (DFWE), a new
few-sample task transfer approach for open-domain dialogue. Our framework
creates a set of diverse expert language models trained using a predefined set
of source tasks. Next, we finetune each of the expert models on the target
task, approaching the target task from several distinct knowledge bases.
Finally, we linearly interpolate between the model weights using a
gradient-free-optimization algorithm, to efficiently find a good interpolation
weighting. We demonstrate the effectiveness of the method on FETA-Friends
outperforming the standard pretrain-finetune approach.
Related papers
- Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent [74.02034188307857]
Merging multiple expert models offers a promising approach for performing multi-task learning without accessing their original data.
We find existing methods inevitably discard task-specific information that, while causing conflicts, is crucial for performance.
Our approach consistently outperforms previous methods, achieving state-of-the-art results across diverse architectures and tasks in both vision and NLP domains.
arXiv Detail & Related papers (2025-01-02T12:45:21Z) - Tint Your Models Task-wise for Improved Multi-task Model Merging [17.496018757317824]
We propose Model Tinting, a test-time approach that introduces a single task-specific layer for each task as trainable adjustments.
Our method jointly trains merging coefficients and task-specific layers, which effectively reduces task conflicts with minimal additional costs.
Our method achieves state-of-the-art performance across both computer vision and natural language processing tasks.
arXiv Detail & Related papers (2024-12-26T07:42:06Z) - Unified Generative and Discriminative Training for Multi-modal Large Language Models [88.84491005030316]
Generative training has enabled Vision-Language Models (VLMs) to tackle various complex tasks.
Discriminative training, exemplified by models like CLIP, excels in zero-shot image-text classification and retrieval.
This paper proposes a unified approach that integrates the strengths of both paradigms.
arXiv Detail & Related papers (2024-11-01T01:51:31Z) - Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate [105.86576388991713]
We introduce a normalized gradient difference (NGDiff) algorithm, enabling us to have better control over the trade-off between the objectives.
We provide a theoretical analysis and empirically demonstrate the superior performance of NGDiff among state-of-the-art unlearning methods on the TOFU and MUSE datasets.
arXiv Detail & Related papers (2024-10-29T14:41:44Z) - On Giant's Shoulders: Effortless Weak to Strong by Dynamic Logits Fusion [23.63688816017186]
Existing weak-to-strong methods often employ a static knowledge transfer ratio and a single small model for transferring complex knowledge.
We propose a dynamic logit fusion approach that works with a series of task-specific small models, each specialized in a different task.
Our method closes the performance gap by 96.4% in single-task scenarios and by 86.3% in multi-task scenarios.
arXiv Detail & Related papers (2024-06-17T03:07:41Z) - Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion [53.33473557562837]
Solving multi-objective optimization problems for large deep neural networks is a challenging task due to the complexity of the loss landscape and the expensive computational cost.
We propose a practical and scalable approach to solve this problem via mixture of experts (MoE) based model fusion.
By ensembling the weights of specialized single-task models, the MoE module can effectively capture the trade-offs between multiple objectives.
arXiv Detail & Related papers (2024-06-14T07:16:18Z) - Concrete Subspace Learning based Interference Elimination for Multi-task
Model Fusion [86.6191592951269]
Merging models fine-tuned from common extensively pretrained large model but specialized for different tasks has been demonstrated as a cheap and scalable strategy to construct a multitask model that performs well across diverse tasks.
We propose the CONtinuous relaxation dis (Concrete) subspace learning method to identify a common lowdimensional subspace and utilize its shared information track interference problem without sacrificing performance.
arXiv Detail & Related papers (2023-12-11T07:24:54Z) - Scalarization for Multi-Task and Multi-Domain Learning at Scale [15.545810422759295]
Training a single model on multiple input domains and/or output tasks allows for compressing information from multiple sources into a unified backbone.
However, optimizing such networks is a challenge due to discrepancies between the different tasks or domains.
arXiv Detail & Related papers (2023-10-13T07:31:04Z) - Pareto Manifold Learning: Tackling multiple tasks via ensembles of
single-task models [50.33956216274694]
In Multi-Task Learning (MTL), tasks may compete and limit the performance achieved on each other, rather than guiding the optimization to a solution.
We propose textitPareto Manifold Learning, an ensembling method in weight space.
arXiv Detail & Related papers (2022-10-18T11:20:54Z) - Interval Bound Interpolation for Few-shot Learning with Few Tasks [15.85259386116784]
Few-shot learning aims to transfer the knowledge acquired from training on a diverse set of tasks to unseen tasks with a limited amount of labeled data.
We introduce the notion of interval bounds from the provably robust training literature to few-shot learning.
We then use a novel strategy to artificially form new tasks for training by interpolating between the available tasks and their respective interval bounds.
arXiv Detail & Related papers (2022-04-07T15:29:27Z) - Learning from demonstration using products of experts: applications to
manipulation and task prioritization [12.378784643460474]
We show that the fusion of models in different task spaces can be expressed as a product of experts (PoE)
Multiple experiments are presented to show that learning the different models jointly in the PoE framework significantly improves the quality of the model.
arXiv Detail & Related papers (2020-10-07T16:24:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.