Related papers: Scalable Model Editing via Customized Expert Networks

Scalable Model Editing via Customized Expert Networks

URL: http://arxiv.org/abs/2404.02699v2
Date: Thu, 8 Aug 2024 13:10:50 GMT
Title: Scalable Model Editing via Customized Expert Networks
Authors: Zihan Yao, Yu He, Tianyu Qi, Ming Li,
Abstract summary: We introduce scalable Model Editing via Customized Expert Networks (SCEN) In the first stage, we train lightweight expert networks individually for each piece of knowledge that needs to be updated. In the second stage, we train a corresponding indexing neuron for each expert to control the activation state of that expert.
Score: 10.211286961377942
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Addressing the issues of hallucinations and outdated knowledge in large language models is critical for their reliable application. Model Editing presents a promising avenue for mitigating these challenges in a cost-effective manner. However, existing methods often suffer from unsatisfactory generalization and unintended effects on non-edited samples. To overcome these limitations, we introduce a novel approach: Scalable Model Editing via Customized Expert Networks (SCEN), which is a two-stage continuous training paradigm. Specifically, in the first stage, we train lightweight expert networks individually for each piece of knowledge that needs to be updated. Subsequently, we train a corresponding indexing neuron for each expert to control the activation state of that expert. We conducted a series of experiments on the ZsRE and Hallucination benchmarks by tuning the advanced open-source LLM, Llama2, achieving state-of-the-art results compared to current mainstream methods. Our code is available at https://github.com/TAL-auroraX/SCEN.

Related papers

Model Merging for Knowledge Editing [53.799891745131724]
Large Language Models (LLMs) require continuous updates to maintain accurate and current knowledge as the world evolves.<n>Existing knowledge editing approaches offer various solutions for knowledge updating, but they often struggle with sequential editing scenarios.<n>This paper proposes a two-stage framework combining robust supervised fine-tuning (R-SFT) with model merging for knowledge editing.
arXiv Detail & Related papers (2025-06-14T07:42:39Z)
Epinet for Content Cold Start [14.018820788546535]
epinets enables efficient approximations of Thompson sampling even when the learning model is a complex neural network. Our experiments demonstrate improvements in both user traffic and engagement efficiency on the Facebook Reels online video platform.
arXiv Detail & Related papers (2024-11-20T19:43:27Z)
Diffusing States and Matching Scores: A New Framework for Imitation Learning [16.941612670582522]
Adversarial Imitation Learning is traditionally framed as a two-player zero-sum game between a learner and an adversarially chosen cost function. In recent years, diffusion models have emerged as a non-adversarial alternative to GANs. We show our approach outperforms GAN-style imitation learning baselines across various continuous control problems.
arXiv Detail & Related papers (2024-10-17T17:59:25Z)
ATLAS: Adapter-Based Multi-Modal Continual Learning with a Two-Stage Learning Strategy [12.150065431702055]
We propose a multi-modal continual learning scheme that consists of experience-based learning and novel knowledge expansion. Our method is proficient for continual learning. It expands the distribution of representation upstream while also minimizing the negative impact of forgetting previous tasks.
arXiv Detail & Related papers (2024-10-14T13:29:42Z)
A Practitioner's Guide to Continual Multimodal Pretraining [83.63894495064855]
Multimodal foundation models serve numerous applications at the intersection of vision and language. To keep models updated, research into continual pretraining mainly explores scenarios with either infrequent, indiscriminate updates on large-scale new data, or frequent, sample-level updates. We introduce FoMo-in-Flux, a continual multimodal pretraining benchmark with realistic compute constraints and practical deployment requirements.
arXiv Detail & Related papers (2024-08-26T17:59:01Z)
MPN: Leveraging Multilingual Patch Neuron for Cross-lingual Model Editing [10.81072864833299]
We propose a simple yet effective method that trains multilingual patch neuron to store cross-lingual knowledge. It can be easily adapted to existing approaches to enhance their cross-lingual editing capabilities.
arXiv Detail & Related papers (2024-01-06T10:40:24Z)
Fine-tuning Language Models for Factuality [96.5203774943198]
Large pre-trained language models (LLMs) have led to their widespread use, sometimes even as a replacement for traditional search engines. Yet language models are prone to making convincing but factually inaccurate claims, often referred to as 'hallucinations' In this work, we fine-tune language models to be more factual, without human labeling.
arXiv Detail & Related papers (2023-11-14T18:59:15Z)
An Emulator for Fine-Tuning Large Language Models using Small Language Models [91.02498576056057]
We introduce emulated fine-tuning (EFT), a principled and practical method for sampling from a distribution that approximates the result of pre-training and fine-tuning at different scales. We show that EFT enables test-time adjustment of competing behavioral traits like helpfulness and harmlessness without additional training. Finally, a special case of emulated fine-tuning, which we call LM up-scaling, avoids resource-intensive fine-tuning of large pre-trained models by ensembling them with small fine-tuned models.
arXiv Detail & Related papers (2023-10-19T17:57:16Z)
Learning without Forgetting for Vision-Language Models [65.49600786387106]
Class-Incremental Learning (CIL) or continual learning is a desired capability in the real world. Recent advances in Vision-Language Models (VLM) have shown promising capabilities in learning generalizable representations. We propose PROjectiOn Fusion (PROOF) that enables VLMs to learn without forgetting.
arXiv Detail & Related papers (2023-05-30T17:59:32Z)
InitialGAN: A Language GAN with Completely Random Initialization [7.642043456676739]
Generative Adversarial Networks (GANs) are shown to have potential to tackle the notorious exposure bias problem. Existing language GANs adopt estimators like REINFORCE or continuous relaxations to model word probabilities. In this work, we present two techniques to tackle these problems: dropout sampling and fully normalized LSTM.
arXiv Detail & Related papers (2022-08-04T08:56:04Z)
Improving Pre-trained Language Model Fine-tuning with Noise Stability Regularization [94.4409074435894]
We propose a novel and effective fine-tuning framework, named Layerwise Noise Stability Regularization (LNSR) Specifically, we propose to inject the standard Gaussian noise and regularize hidden representations of the fine-tuned model. We demonstrate the advantages of the proposed method over other state-of-the-art algorithms including L2-SP, Mixout and SMART.
arXiv Detail & Related papers (2022-06-12T04:42:49Z)
Adaptive Memory Networks with Self-supervised Learning for Unsupervised Anomaly Detection [54.76993389109327]
Unsupervised anomaly detection aims to build models to detect unseen anomalies by only training on the normal data. We propose a novel approach called Adaptive Memory Network with Self-supervised Learning (AMSL) to address these challenges. AMSL incorporates a self-supervised learning module to learn general normal patterns and an adaptive memory fusion module to learn rich feature representations.
arXiv Detail & Related papers (2022-01-03T03:40:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.