Do the Frankenstein, or how to achieve better out-of-distribution
performance with manifold mixing model soup
- URL: http://arxiv.org/abs/2309.08610v1
- Date: Mon, 28 Aug 2023 06:13:32 GMT
- Title: Do the Frankenstein, or how to achieve better out-of-distribution
performance with manifold mixing model soup
- Authors: Hannes Fassold
- Abstract summary: We show that the fused model gives significantly better out-of-distribution performance when finetuning a CLIP model for image classification.
It provides also better accuracy on the original dataset where the finetuning has been done.
- Score: 1.0878040851637998
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The standard recipe applied in transfer learning is to finetune a pretrained
model on the task-specific dataset with different hyperparameter settings and
pick the model with the highest accuracy on the validation dataset.
Unfortunately, this leads to models which do not perform well under
distribution shifts, e.g. when the model is given graphical sketches of the
object as input instead of photos. In order to address this, we propose the
manifold mixing model soup, an algorithm which mixes together the latent space
manifolds of multiple finetuned models in an optimal way in order to generate a
fused model. We show that the fused model gives significantly better
out-of-distribution performance (+3.5 % compared to best individual model) when
finetuning a CLIP model for image classification. In addition, it provides also
better accuracy on the original dataset where the finetuning has been done.
Related papers
- Calibrated Cache Model for Few-Shot Vision-Language Model Adaptation [36.45488536471859]
Similarity refines the image-image similarity by using unlabeled images.
Weight introduces a precision matrix into the weight function to adequately model the relation between training samples.
To reduce the high complexity of GPs, we propose a group-based learning strategy.
arXiv Detail & Related papers (2024-10-11T15:12:30Z) - EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - Model Stock: All we need is just a few fine-tuned models [34.449901046895185]
This paper introduces an efficient fine-tuning method for large pre-trained models, offering strong in-distribution (ID) and out-of-distribution (OOD) performance.
We employ significantly fewer models to achieve final weights yet yield superior accuracy.
We demonstrate the efficacy of Model Stock with fine-tuned models based upon pre-trained CLIP architectures.
arXiv Detail & Related papers (2024-03-28T15:57:20Z) - Diffusion-TTA: Test-time Adaptation of Discriminative Models via
Generative Feedback [97.0874638345205]
generative models can be great test-time adapters for discriminative models.
Our method, Diffusion-TTA, adapts pre-trained discriminative models to each unlabelled example in the test set.
We show Diffusion-TTA significantly enhances the accuracy of various large-scale pre-trained discriminative models.
arXiv Detail & Related papers (2023-11-27T18:59:53Z) - Knowledge is a Region in Weight Space for Fine-tuned Language Models [48.589822853418404]
We study how the weight space and the underlying loss landscape of different models are interconnected.
We show that language models that have been finetuned on the same dataset form a tight cluster in the weight space, while models finetuned on different datasets from the same underlying task form a looser cluster.
arXiv Detail & Related papers (2023-02-09T18:59:18Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - Model ensemble instead of prompt fusion: a sample-specific knowledge
transfer method for few-shot prompt tuning [85.55727213502402]
We focus on improving the few-shot performance of prompt tuning by transferring knowledge from soft prompts of source tasks.
We propose Sample-specific Ensemble of Source Models (SESoM)
SESoM learns to adjust the contribution of each source model for each target sample separately when ensembling source model outputs.
arXiv Detail & Related papers (2022-10-23T01:33:16Z) - Model soups: averaging weights of multiple fine-tuned models improves
accuracy without increasing inference time [69.7693300927423]
We show that averaging the weights of multiple models fine-tuned with different hyper parameter configurations improves accuracy and robustness.
We show that the model soup approach extends to multiple image classification and natural language processing tasks.
arXiv Detail & Related papers (2022-03-10T17:03:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.