CAMERO: Consistency Regularized Ensemble of Perturbed Language Models
with Weight Sharing
- URL: http://arxiv.org/abs/2204.06625v1
- Date: Wed, 13 Apr 2022 19:54:51 GMT
- Title: CAMERO: Consistency Regularized Ensemble of Perturbed Language Models
with Weight Sharing
- Authors: Chen Liang, Pengcheng He, Yelong Shen, Weizhu Chen, Tuo Zhao
- Abstract summary: We propose a consistency-regularized ensemble learning approach based on perturbed models, named CAMERO.
Specifically, we share the weights of bottom layers across all models and apply different perturbations to the hidden representations for different models, which can effectively promote the model diversity.
Our experiments using large language models demonstrate that CAMERO significantly improves the generalization performance of the ensemble model.
- Score: 83.63107444454938
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Model ensemble is a popular approach to produce a low-variance and
well-generalized model. However, it induces large memory and inference costs,
which are often not affordable for real-world deployment. Existing work has
resorted to sharing weights among models. However, when increasing the
proportion of the shared weights, the resulting models tend to be similar, and
the benefits of using model ensemble diminish. To retain ensemble benefits
while maintaining a low memory cost, we propose a consistency-regularized
ensemble learning approach based on perturbed models, named CAMERO.
Specifically, we share the weights of bottom layers across all models and apply
different perturbations to the hidden representations for different models,
which can effectively promote the model diversity. Meanwhile, we apply a
prediction consistency regularizer across the perturbed models to control the
variance due to the model diversity. Our experiments using large language
models demonstrate that CAMERO significantly improves the generalization
performance of the ensemble model. Specifically, CAMERO outperforms the
standard ensemble of 8 BERT-base models on the GLUE benchmark by 0.7 with a
significantly smaller model size (114.2M vs. 880.6M).
Related papers
- The Law of Multi-Model Collaboration: Scaling Limits of Model Ensembling for Large Language Models [54.51795784459866]
We propose a theoretical framework of performance scaling for multi-model collaboration.<n>We show that multi-model systems follow a power-law scaling with respect to the total parameter count.<n> ensembles of heterogeneous model families achieve better performance scaling than those formed within a single model family.
arXiv Detail & Related papers (2025-12-29T09:55:12Z) - Towards Reversible Model Merging For Low-rank Weights [5.100622189286672]
Model merging aims to combine multiple fine-tuned models into a single set of weights that performs well across all source tasks.<n>We show that applying conventional merging methods to low-rank weights leads to severe performance degradation in the merged model.<n>We propose a fundamentally different approach: instead of collapsing all adapters into one set of weights, we construct a compact basis.<n>This reframes merging as generating a reconstruction-capable model space rather than producing a single merged model.
arXiv Detail & Related papers (2025-10-15T23:22:38Z) - A Collaborative Ensemble Framework for CTR Prediction [73.59868761656317]
We propose a novel framework, Collaborative Ensemble Training Network (CETNet), to leverage multiple distinct models.
Unlike naive model scaling, our approach emphasizes diversity and collaboration through collaborative learning.
We validate our framework on three public datasets and a large-scale industrial dataset from Meta.
arXiv Detail & Related papers (2024-11-20T20:38:56Z) - What Matters for Model Merging at Scale? [94.26607564817786]
Model merging aims to combine multiple expert models into a more capable single model.
Previous studies have primarily focused on merging a few small models.
This study systematically evaluates the utility of model merging at scale.
arXiv Detail & Related papers (2024-10-04T17:17:19Z) - EMR-Merging: Tuning-Free High-Performance Model Merging [55.03509900949149]
We show that Elect, Mask & Rescale-Merging (EMR-Merging) shows outstanding performance compared to existing merging methods.
EMR-Merging is tuning-free, thus requiring no data availability or any additional training while showing impressive performance.
arXiv Detail & Related papers (2024-05-23T05:25:45Z) - Continuous Language Model Interpolation for Dynamic and Controllable Text Generation [7.535219325248997]
We focus on the challenging case where the model must dynamically adapt to diverse -- and often changing -- user preferences.
We leverage adaptation methods based on linear weight, casting them as continuous multi-domain interpolators.
We show that varying the weights yields predictable and consistent change in the model outputs.
arXiv Detail & Related papers (2024-04-10T15:55:07Z) - Model soups: averaging weights of multiple fine-tuned models improves
accuracy without increasing inference time [69.7693300927423]
We show that averaging the weights of multiple models fine-tuned with different hyper parameter configurations improves accuracy and robustness.
We show that the model soup approach extends to multiple image classification and natural language processing tasks.
arXiv Detail & Related papers (2022-03-10T17:03:49Z) - When Ensembling Smaller Models is More Efficient than Single Large
Models [52.38997176317532]
We show that ensembles can outperform single models with both higher accuracy and requiring fewer total FLOPs to compute.
This presents an interesting observation that output diversity in ensembling can often be more efficient than training larger models.
arXiv Detail & Related papers (2020-05-01T18:56:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.