Composing Ensembles of Pre-trained Models via Iterative Consensus
- URL: http://arxiv.org/abs/2210.11522v1
- Date: Thu, 20 Oct 2022 18:46:31 GMT
- Title: Composing Ensembles of Pre-trained Models via Iterative Consensus
- Authors: Shuang Li, Yilun Du, Joshua B. Tenenbaum, Antonio Torralba, Igor
Mordatch
- Abstract summary: We propose a unified framework for composing ensembles of different pre-trained models.
We use pre-trained models as "generators" or "scorers" and compose them via closed-loop iterative consensus optimization.
We demonstrate that consensus achieved by an ensemble of scorers outperforms the feedback of a single scorer.
- Score: 95.10641301155232
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large pre-trained models exhibit distinct and complementary capabilities
dependent on the data they are trained on. Language models such as GPT-3 are
capable of textual reasoning but cannot understand visual information, while
vision models such as DALL-E can generate photorealistic photos but fail to
understand complex language descriptions. In this work, we propose a unified
framework for composing ensembles of different pre-trained models -- combining
the strengths of each individual model to solve various multimodal problems in
a zero-shot manner. We use pre-trained models as "generators" or "scorers" and
compose them via closed-loop iterative consensus optimization. The generator
constructs proposals and the scorers iteratively provide feedback to refine the
generated result. Such closed-loop communication enables models to correct
errors caused by other models, significantly boosting performance on downstream
tasks, e.g. improving accuracy on grade school math problems by 7.5%, without
requiring any model finetuning. We demonstrate that consensus achieved by an
ensemble of scorers outperforms the feedback of a single scorer, by leveraging
the strengths of each expert model. Results show that the proposed method can
be used as a general purpose framework for a wide range of zero-shot multimodal
tasks, such as image generation, video question answering, mathematical
reasoning, and robotic manipulation. Project page:
https://energy-based-model.github.io/composing-pretrained-models.
Related papers
- Enabling Small Models for Zero-Shot Classification through Model Label Learning [50.68074833512999]
We introduce a novel paradigm, Model Label Learning (MLL), which bridges the gap between models and their functionalities.
Experiments on seven real-world datasets validate the effectiveness and efficiency of MLL.
arXiv Detail & Related papers (2024-08-21T09:08:26Z) - Has Your Pretrained Model Improved? A Multi-head Posterior Based
Approach [25.927323251675386]
We leverage the meta-features associated with each entity as a source of worldly knowledge and employ entity representations from the models.
We propose using the consistency between these representations and the meta-features as a metric for evaluating pre-trained models.
Our method's effectiveness is demonstrated across various domains, including models with relational datasets, large language models and image models.
arXiv Detail & Related papers (2024-01-02T17:08:26Z) - Machine Learning Model Attribution Challenge [2.6532805035238747]
Fine-tuned machine learning models may derive from other trained models without obvious attribution characteristics.
In this challenge, participants identify the publicly-available base models that underlie a set of anonymous, fine-tuned large language models.
arXiv Detail & Related papers (2023-02-13T22:05:27Z) - Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models.
This creates a barrier to fusing knowledge across individual models to yield a better single model.
We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z) - Multimodal Knowledge Alignment with Reinforcement Learning [103.68816413817372]
ESPER extends language-only zero-shot models to unseen multimodal tasks, like image and audio captioning.
Our key novelty is to use reinforcement learning to align multimodal inputs to language model generations without direct supervision.
Experiments demonstrate that ESPER outperforms baselines and prior work on a variety of zero-shot tasks.
arXiv Detail & Related papers (2022-05-25T10:12:17Z) - What Language Model Architecture and Pretraining Objective Work Best for
Zero-Shot Generalization? [50.84738303888189]
We present a large-scale evaluation of modeling choices and their impact on zero-shot generalization.
We train models with over 5 billion parameters for more than 170 billion tokens.
We find that pretrained causal decoder models can be efficiently adapted into non-causal decoder models.
arXiv Detail & Related papers (2022-04-12T14:19:49Z) - MEGA: Model Stealing via Collaborative Generator-Substitute Networks [4.065949099860426]
Recent data-free model stealingmethods are shown effective to extract the knowledge of thetarget model without using real query examples.
We propose a data-free model stealing frame-work,MEGA, which is based on collaborative generator-substitute networks.
Our results show that theaccuracy of our trained substitute model and the adversarialattack success rate over it can be up to 33% and 40% higherthan state-of-the-art data-free black-box attacks.
arXiv Detail & Related papers (2022-01-31T09:34:28Z) - What do we expect from Multiple-choice QA Systems? [70.86513724662302]
We consider a top performing model on several Multiple Choice Question Answering (MCQA) datasets.
We evaluate it against a set of expectations one might have from such a model, using a series of zero-information perturbations of the model's inputs.
arXiv Detail & Related papers (2020-11-20T21:27:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.