Model Ratatouille: Recycling Diverse Models for Out-of-Distribution
Generalization
- URL: http://arxiv.org/abs/2212.10445v3
- Date: Wed, 9 Aug 2023 14:02:55 GMT
- Title: Model Ratatouille: Recycling Diverse Models for Out-of-Distribution
Generalization
- Authors: Alexandre Ram\'e, Kartik Ahuja, Jianyu Zhang, Matthieu Cord, L\'eon
Bottou, David Lopez-Paz
- Abstract summary: Foundation models are redefining how AI systems are built. Practitioners now follow a standard procedure to build their machine learning solutions.
We propose model ratatouille, a new strategy to recycle the multiple fine-tunings of the same foundation model on diverse auxiliary tasks.
- Score: 99.6826401545377
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Foundation models are redefining how AI systems are built. Practitioners now
follow a standard procedure to build their machine learning solutions: from a
pre-trained foundation model, they fine-tune the weights on the target task of
interest. So, the Internet is swarmed by a handful of foundation models
fine-tuned on many diverse tasks: these individual fine-tunings exist in
isolation without benefiting from each other. In our opinion, this is a missed
opportunity, as these specialized models contain rich and diverse features. In
this paper, we thus propose model ratatouille, a new strategy to recycle the
multiple fine-tunings of the same foundation model on diverse auxiliary tasks.
Specifically, we repurpose these auxiliary weights as initializations for
multiple parallel fine-tunings on the target task; then, we average all
fine-tuned weights to obtain the final model. This recycling strategy aims at
maximizing the diversity in weights by leveraging the diversity in auxiliary
tasks. Empirically, it improves the state of the art on the reference DomainBed
benchmark for out-of-distribution generalization. Looking forward, this work
contributes to the emerging paradigm of updatable machine learning where, akin
to open-source software development, the community collaborates to reliably
update machine learning models. Our code is released:
https://github.com/facebookresearch/ModelRatatouille.
Related papers
- EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent [2.3967405016776384]
Jack of All Trades (JAT) is a transformer-based model with a unique design optimized for handling sequential decision-making tasks.
JAT is the first model of its kind to be fully open-sourced at https://huggingface.co/jat-project/jat, including a pioneering general-purpose dataset.
arXiv Detail & Related papers (2024-02-15T10:01:55Z) - Model Breadcrumbs: Scaling Multi-Task Model Merging with Sparse Masks [14.349517221831364]
A common approach for targeted problems involves fine-tuning pre-trained foundation models for specific target tasks.
We introduce a new simple method, Model Breadcrumbs, which consists of a sparsely defined set of weights that carve out a trajectory within the weight space of a pre-trained model.
Our experiments demonstrate the effectiveness of Model Breadcrumbs to simultaneously improve performance across multiple tasks.
arXiv Detail & Related papers (2023-12-11T19:10:55Z) - ZhiJian: A Unifying and Rapidly Deployable Toolbox for Pre-trained Model
Reuse [59.500060790983994]
This paper introduces ZhiJian, a comprehensive and user-friendly toolbox for model reuse, utilizing the PyTorch backend.
ZhiJian presents a novel paradigm that unifies diverse perspectives on model reuse, encompassing target architecture construction with PTM, tuning target model with PTM, and PTM-based inference.
arXiv Detail & Related papers (2023-08-17T19:12:13Z) - UnIVAL: Unified Model for Image, Video, Audio and Language Tasks [105.77733287326308]
UnIVAL model goes beyond two modalities and unifies text, images, video, and audio into a single model.
Our model is efficiently pretrained on many tasks, based on task balancing and multimodal curriculum learning.
Thanks to the unified model, we propose a novel study on multimodal model merging via weight generalization.
arXiv Detail & Related papers (2023-07-30T09:48:36Z) - TaCA: Upgrading Your Visual Foundation Model with Task-agnostic
Compatible Adapter [21.41170708560114]
A growing number of applications based on visual foundation models are emerging.
In situations involving system upgrades, it becomes essential to re-train all downstream modules to adapt to the new foundation model.
We introduce a parameter-efficient and task-agnostic adapter, dubbed TaCA, that facilitates compatibility across distinct foundation models.
arXiv Detail & Related papers (2023-06-22T03:00:24Z) - Towards Mode Balancing of Generative Models via Diversity Weights [1.2354076490479513]
We present diversity weights, a training scheme that increases a model's output diversity by balancing the modes in the training dataset.
We discuss connections of our approach to diversity, equity, and inclusion in generative machine learning more generally, and computational creativity specifically.
arXiv Detail & Related papers (2023-04-24T09:55:17Z) - Towards Efficient Task-Driven Model Reprogramming with Foundation Models [52.411508216448716]
Vision foundation models exhibit impressive power, benefiting from the extremely large model capacity and broad training data.
However, in practice, downstream scenarios may only support a small model due to the limited computational resources or efficiency considerations.
This brings a critical challenge for the real-world application of foundation models: one has to transfer the knowledge of a foundation model to the downstream task.
arXiv Detail & Related papers (2023-04-05T07:28:33Z) - Model Reuse with Reduced Kernel Mean Embedding Specification [70.044322798187]
We present a two-phase framework for finding helpful models for a current application.
In the upload phase, when a model is uploading into the pool, we construct a reduced kernel mean embedding (RKME) as a specification for the model.
Then in the deployment phase, the relatedness of the current task and pre-trained models will be measured based on the value of the RKME specification.
arXiv Detail & Related papers (2020-01-20T15:15:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.