Related papers: Derivative Free Weight-space Ensembling

Derivative Free Weight-space Ensembling

URL: http://arxiv.org/abs/2307.03506v2
Date: Wed, 26 Jul 2023 09:06:22 GMT
Title: Derivative Free Weight-space Ensembling
Authors: Dean Ninalga
Abstract summary: We introduce Derivative Free Weight-space Ensembling (DFWE), a new few-sample task transfer approach for open-domain dialogue. We finetune each of the expert models on the target task, approaching the target task from several distinct knowledge bases. We linearly interpolate between the model weights using a gradient-free-optimization algorithm, to efficiently find a good weighting.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent work suggests that interpolating between the weights of two specialized language models can transfer knowledge between tasks in a way that multi-task learning cannot. However, very few have explored interpolation between more than two models, where each has a distinct knowledge base. In this paper, we introduce Derivative Free Weight-space Ensembling (DFWE), a new few-sample task transfer approach for open-domain dialogue. Our framework creates a set of diverse expert language models trained using a predefined set of source tasks. Next, we finetune each of the expert models on the target task, approaching the target task from several distinct knowledge bases. Finally, we linearly interpolate between the model weights using a gradient-free-optimization algorithm, to efficiently find a good interpolation weighting. We demonstrate the effectiveness of the method on FETA-Friends outperforming the standard pretrain-finetune approach.

Related papers

Towards Agentic AI for Multimodal-Guided Video Object Segmentation [14.877182670778284]
Referring-based Video Object is a multimodal problem that requires producing fine-grained segmentation results guided by external cues.<n>Recent advances in vision-language foundation models open a promising direction toward training-free approaches.<n>We propose Multi-Modal Agent, a novel agentic system designed to solve this task in a more flexible and adaptive manner.
arXiv Detail & Related papers (2025-08-14T12:11:15Z)
The Unreasonable Effectiveness of Model Merging for Cross-Lingual Transfer in LLMs [54.59207567677249]
Large language models (LLMs) still struggle across tasks outside of high-resource languages.<n>In this work, we investigate cross-lingual transfer to lower-resource languages where task-specific post-training data is scarce.
arXiv Detail & Related papers (2025-05-23T20:28:31Z)
UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines [64.84631333071728]
We introduce bfUnistage, a unified Transformer-based framework fortemporal modeling. Our work demonstrates that a task-specific vision-text can build a generalizable model fortemporal learning. We also introduce a temporal module to incorporate temporal dynamics explicitly.
arXiv Detail & Related papers (2025-03-26T17:33:23Z)
Knowledge-Decoupled Synergetic Learning: An MLLM based Collaborative Approach to Few-shot Multimodal Dialogue Intention Recognition [17.790383360652704]
Training for few-shot multimodal dialogue intention recognition involves two interconnected tasks. This phenomenon is attributed to knowledge interference stemming from the superposition of weight matrix updates during the training process. We propose Knowledge-Decoupled Synergetic Learning, which transforms knowledge into interpretable rules, while applying the post-training of larger models.
arXiv Detail & Related papers (2025-03-06T08:28:44Z)
Modeling Multi-Task Model Merging as Adaptive Projective Gradient Descent [74.02034188307857]
Merging multiple expert models offers a promising approach for performing multi-task learning without accessing their original data. We find existing methods inevitably discard task-specific information that, while causing conflicts, is crucial for performance. Our approach consistently outperforms previous methods, achieving state-of-the-art results across diverse architectures and tasks in both vision and NLP domains.
arXiv Detail & Related papers (2025-01-02T12:45:21Z)
Unified Generative and Discriminative Training for Multi-modal Large Language Models [88.84491005030316]
Generative training has enabled Vision-Language Models (VLMs) to tackle various complex tasks. Discriminative training, exemplified by models like CLIP, excels in zero-shot image-text classification and retrieval. This paper proposes a unified approach that integrates the strengths of both paradigms.
arXiv Detail & Related papers (2024-11-01T01:51:31Z)
Unlearning as multi-task optimization: A normalized gradient difference approach with an adaptive learning rate [105.86576388991713]
We introduce a normalized gradient difference (NGDiff) algorithm, enabling us to have better control over the trade-off between the objectives. We provide a theoretical analysis and empirically demonstrate the superior performance of NGDiff among state-of-the-art unlearning methods on the TOFU and MUSE datasets.
arXiv Detail & Related papers (2024-10-29T14:41:44Z)
On Giant's Shoulders: Effortless Weak to Strong by Dynamic Logits Fusion [23.63688816017186]
Existing weak-to-strong methods often employ a static knowledge transfer ratio and a single small model for transferring complex knowledge. We propose a dynamic logit fusion approach that works with a series of task-specific small models, each specialized in a different task. Our method closes the performance gap by 96.4% in single-task scenarios and by 86.3% in multi-task scenarios.
arXiv Detail & Related papers (2024-06-17T03:07:41Z)
Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion [53.33473557562837]
Solving multi-objective optimization problems for large deep neural networks is a challenging task due to the complexity of the loss landscape and the expensive computational cost. We propose a practical and scalable approach to solve this problem via mixture of experts (MoE) based model fusion. By ensembling the weights of specialized single-task models, the MoE module can effectively capture the trade-offs between multiple objectives.
arXiv Detail & Related papers (2024-06-14T07:16:18Z)
Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion [86.6191592951269]
Merging models fine-tuned from common extensively pretrained large model but specialized for different tasks has been demonstrated as a cheap and scalable strategy to construct a multitask model that performs well across diverse tasks. We propose the CONtinuous relaxation dis (Concrete) subspace learning method to identify a common lowdimensional subspace and utilize its shared information track interference problem without sacrificing performance.
arXiv Detail & Related papers (2023-12-11T07:24:54Z)
Scalarization for Multi-Task and Multi-Domain Learning at Scale [15.545810422759295]
Training a single model on multiple input domains and/or output tasks allows for compressing information from multiple sources into a unified backbone. However, optimizing such networks is a challenge due to discrepancies between the different tasks or domains.
arXiv Detail & Related papers (2023-10-13T07:31:04Z)
Pareto Manifold Learning: Tackling multiple tasks via ensembles of single-task models [50.33956216274694]
In Multi-Task Learning (MTL), tasks may compete and limit the performance achieved on each other, rather than guiding the optimization to a solution. We propose textitPareto Manifold Learning, an ensembling method in weight space.
arXiv Detail & Related papers (2022-10-18T11:20:54Z)
Meta-Learning via Classifier(-free) Guidance [5.812784742024491]
State-of-the-art meta-learning techniques do not optimize for zero-shot adaptation to unseen tasks. We propose meta-learning techniques that use natural language guidance to achieve higher zero-shot performance.
arXiv Detail & Related papers (2022-10-17T11:09:35Z)
Set-based Meta-Interpolation for Few-Task Meta-Learning [79.4236527774689]
We propose a novel domain-agnostic task augmentation method, Meta-Interpolation, to densify the meta-training task distribution. We empirically validate the efficacy of Meta-Interpolation on eight datasets spanning across various domains.
arXiv Detail & Related papers (2022-05-20T06:53:03Z)
Interval Bound Interpolation for Few-shot Learning with Few Tasks [15.85259386116784]
Few-shot learning aims to transfer the knowledge acquired from training on a diverse set of tasks to unseen tasks with a limited amount of labeled data. We introduce the notion of interval bounds from the provably robust training literature to few-shot learning. We then use a novel strategy to artificially form new tasks for training by interpolating between the available tasks and their respective interval bounds.
arXiv Detail & Related papers (2022-04-07T15:29:27Z)
Learning from demonstration using products of experts: applications to manipulation and task prioritization [12.378784643460474]
We show that the fusion of models in different task spaces can be expressed as a product of experts (PoE) Multiple experiments are presented to show that learning the different models jointly in the PoE framework significantly improves the quality of the model.
arXiv Detail & Related papers (2020-10-07T16:24:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.