Model Ratatouille: Recycling Diverse Models for Out-of-Distribution
Generalization
- URL: http://arxiv.org/abs/2212.10445v3
- Date: Wed, 9 Aug 2023 14:02:55 GMT
- Title: Model Ratatouille: Recycling Diverse Models for Out-of-Distribution
Generalization
- Authors: Alexandre Ram\'e, Kartik Ahuja, Jianyu Zhang, Matthieu Cord, L\'eon
Bottou, David Lopez-Paz
- Abstract summary: Foundation models are redefining how AI systems are built. Practitioners now follow a standard procedure to build their machine learning solutions.
We propose model ratatouille, a new strategy to recycle the multiple fine-tunings of the same foundation model on diverse auxiliary tasks.
- Score: 99.6826401545377
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Foundation models are redefining how AI systems are built. Practitioners now
follow a standard procedure to build their machine learning solutions: from a
pre-trained foundation model, they fine-tune the weights on the target task of
interest. So, the Internet is swarmed by a handful of foundation models
fine-tuned on many diverse tasks: these individual fine-tunings exist in
isolation without benefiting from each other. In our opinion, this is a missed
opportunity, as these specialized models contain rich and diverse features. In
this paper, we thus propose model ratatouille, a new strategy to recycle the
multiple fine-tunings of the same foundation model on diverse auxiliary tasks.
Specifically, we repurpose these auxiliary weights as initializations for
multiple parallel fine-tunings on the target task; then, we average all
fine-tuned weights to obtain the final model. This recycling strategy aims at
maximizing the diversity in weights by leveraging the diversity in auxiliary
tasks. Empirically, it improves the state of the art on the reference DomainBed
benchmark for out-of-distribution generalization. Looking forward, this work
contributes to the emerging paradigm of updatable machine learning where, akin
to open-source software development, the community collaborates to reliably
update machine learning models. Our code is released:
https://github.com/facebookresearch/ModelRatatouille.
Related papers
- 360Brew: A Decoder-only Foundation Model for Personalized Ranking and Recommendation [15.922317310616952]
We introduce our research pre-production model, 360Brew V1.0, a 150B parameter, decoder-only model that has been trained and fine-tuned on LinkedIn's data and tasks.
This model is capable of solving over 30 predictive tasks across various segments of the LinkedIn platform, achieving performance levels comparable to or exceeding those of current production systems.
arXiv Detail & Related papers (2025-01-27T19:14:52Z) - Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains [114.76612918465948]
Large language models (LLMs) have achieved remarkable performance in recent years but are fundamentally limited by the underlying training data.
We propose a complementary approach towards self-improvement where finetuning is applied to a multiagent society of language models.
arXiv Detail & Related papers (2025-01-10T04:35:46Z) - RADIOv2.5: Improved Baselines for Agglomerative Vision Foundation Models [60.596005921295806]
Agglomerative models have emerged as a powerful approach to training vision foundation models.
We identify critical challenges including resolution mode shifts, teacher imbalance, idiosyncratic teacher artifacts, and an excessive number of output tokens.
We propose several novel solutions: multi-resolution training, mosaic augmentation, and improved balancing of teacher loss functions.
arXiv Detail & Related papers (2024-12-10T17:06:41Z) - EnsIR: An Ensemble Algorithm for Image Restoration via Gaussian Mixture Models [70.60381055741391]
Image restoration challenges related to illposed problems, resulting in deviations between single model predictions and ground-truths.
Ensemble learning aims to address these deviations by combining the predictions of multiple base models.
We employ an expectation (EM)-based algorithm to estimate ensemble weights for prediction candidates.
Our algorithm is model-agnostic and training-free, allowing seamless integration and enhancement of various pre-trained image restoration models.
arXiv Detail & Related papers (2024-10-30T12:16:35Z) - Model Breadcrumbs: Scaling Multi-Task Model Merging with Sparse Masks [12.146530928616386]
A common approach for targeted problems involves fine-tuning pre-trained foundation models for specific target tasks.
This work focuses on the problem of merging multiple fine-tunings of the same foundation model derived from a spectrum of auxiliary tasks.
We introduce a new simple method, Model Breadcrumbs, which consists of a sparsely defined weight set that guides model adaptation within the weight space of a pre-trained model.
arXiv Detail & Related papers (2023-12-11T19:10:55Z) - ZhiJian: A Unifying and Rapidly Deployable Toolbox for Pre-trained Model
Reuse [59.500060790983994]
This paper introduces ZhiJian, a comprehensive and user-friendly toolbox for model reuse, utilizing the PyTorch backend.
ZhiJian presents a novel paradigm that unifies diverse perspectives on model reuse, encompassing target architecture construction with PTM, tuning target model with PTM, and PTM-based inference.
arXiv Detail & Related papers (2023-08-17T19:12:13Z) - UnIVAL: Unified Model for Image, Video, Audio and Language Tasks [105.77733287326308]
UnIVAL model goes beyond two modalities and unifies text, images, video, and audio into a single model.
Our model is efficiently pretrained on many tasks, based on task balancing and multimodal curriculum learning.
Thanks to the unified model, we propose a novel study on multimodal model merging via weight generalization.
arXiv Detail & Related papers (2023-07-30T09:48:36Z) - TaCA: Upgrading Your Visual Foundation Model with Task-agnostic
Compatible Adapter [21.41170708560114]
A growing number of applications based on visual foundation models are emerging.
In situations involving system upgrades, it becomes essential to re-train all downstream modules to adapt to the new foundation model.
We introduce a parameter-efficient and task-agnostic adapter, dubbed TaCA, that facilitates compatibility across distinct foundation models.
arXiv Detail & Related papers (2023-06-22T03:00:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.