Related papers: Fuse to Forget: Bias Reduction and Selective Memorization through Model Fusion

Fuse to Forget: Bias Reduction and Selective Memorization through Model Fusion

URL: http://arxiv.org/abs/2311.07682v2
Date: Thu, 10 Oct 2024 01:13:20 GMT
Title: Fuse to Forget: Bias Reduction and Selective Memorization through Model Fusion
Authors: Kerem Zaman, Leshem Choshen, Shashank Srivastava,
Abstract summary: We study whether model fusion can be used to reduce unwanted knowledge. We investigate the effects of model fusion in three scenarios: the learning of shortcuts, social biases, and memorization of training data.
Score: 21.853861315322824
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Model fusion research aims to aggregate the knowledge of multiple individual models to enhance performance by combining their weights. In this work, we study the inverse problem: investigating whether model fusion can be used to reduce unwanted knowledge. We investigate the effects of model fusion in three scenarios: the learning of shortcuts, social biases, and memorization of training data in fine-tuned language models. Through experiments covering classification and generation tasks, our analysis highlights that shared knowledge among models is enhanced during model fusion, while unshared knowledge is usually forgotten. Based on this observation, we demonstrate the potential of model fusion as a debiasing tool and showcase its efficacy in addressing privacy concerns associated with language models.

Related papers

Too Big to Think: Capacity, Memorization, and Generalization in Pre-Trained Transformers [0.0]
We investigate the relationship between memorization and generalization in large language models.<n>Small models extrapolate to unseen arithmetic cases but fail to memorize facts, while larger models memorize but fail to extrapolate.<n>Findings suggest that pre-training may intrinsically favor one learning mode over the other.
arXiv Detail & Related papers (2025-06-10T14:49:33Z)
Rethinking Weight-Averaged Model-merging [15.2881959315021]
Model-merging has emerged as a powerful approach in deep learning, capable of enhancing model performance without any training. We investigate this technique from three novel perspectives to provide deeper insights into why and how weight-averaged model-merging works. Our findings shed light on the "black box" of weight-averaged model-merging, offering valuable insights and practical recommendations.
arXiv Detail & Related papers (2024-11-14T08:02:14Z)
What Matters for Model Merging at Scale? [94.26607564817786]
Model merging aims to combine multiple expert models into a more capable single model. Previous studies have primarily focused on merging a few small models. This study systematically evaluates the utility of model merging at scale.
arXiv Detail & Related papers (2024-10-04T17:17:19Z)
SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models [85.67096251281191]
We present an innovative approach to model fusion called zero-shot Sparse MIxture of Low-rank Experts (SMILE) construction. SMILE allows for the upscaling of source models into an MoE model without extra data or further training. We conduct extensive experiments across diverse scenarios, such as image classification and text generation tasks, using full fine-tuning and LoRA fine-tuning.
arXiv Detail & Related papers (2024-08-19T17:32:15Z)
FusionBench: A Comprehensive Benchmark of Deep Model Fusion [78.80920533793595]
Deep model fusion is a technique that unifies the predictions or parameters of several deep neural networks into a single model. FusionBench is the first comprehensive benchmark dedicated to deep model fusion.
arXiv Detail & Related papers (2024-06-05T13:54:28Z)
Fantastic Gains and Where to Find Them: On the Existence and Prospect of General Knowledge Transfer between Any Pretrained Model [74.62272538148245]
We show that for arbitrary pairings of pretrained models, one model extracts significant data context unavailable in the other. We investigate if it is possible to transfer such "complementary" knowledge from one model to another without performance degradation.
arXiv Detail & Related papers (2023-10-26T17:59:46Z)
Deep Model Fusion: A Survey [37.39100741978586]
Deep model fusion/merging is an emerging technique that merges the parameters or predictions of multiple deep learning models into a single one. It faces several challenges, including high computational cost, high-dimensional parameter space, interference between different heterogeneous models, etc.
arXiv Detail & Related papers (2023-09-27T14:40:12Z)
Scaling Vision-Language Models with Sparse Mixture of Experts [128.0882767889029]
We show that mixture-of-experts (MoE) techniques can achieve state-of-the-art performance on a range of benchmarks over dense models of equivalent computational cost. Our research offers valuable insights into stabilizing the training of MoE models, understanding the impact of MoE on model interpretability, and balancing the trade-offs between compute performance when scaling vision-language models.
arXiv Detail & Related papers (2023-03-13T16:00:31Z)
Dataless Knowledge Fusion by Merging Weights of Language Models [51.8162883997512]
Fine-tuning pre-trained language models has become the prevalent paradigm for building downstream NLP models. This creates a barrier to fusing knowledge across individual models to yield a better single model. We propose a dataless knowledge fusion method that merges models in their parameter space.
arXiv Detail & Related papers (2022-12-19T20:46:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.