Disentangling Task Interference within Neurons: Model Merging in Alignment with Neuronal Mechanisms
- URL: http://arxiv.org/abs/2503.05320v1
- Date: Fri, 07 Mar 2025 11:00:24 GMT
- Title: Disentangling Task Interference within Neurons: Model Merging in Alignment with Neuronal Mechanisms
- Authors: Zitao Fang, Guodong DU, Shuyang Yu, Yifei Guo, Yiwei Zhang, Jing Li, Ho-Kin Tang, Sim Kuan Goh,
- Abstract summary: We present the first study on the impact of neuronal alignment in model merging.<n>We introduce NeuroMerging, a novel merging framework developed to mitigate task interference within neuronal subspaces.
- Score: 9.230323472193918
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Fine-tuning pre-trained models on targeted datasets enhances task-specific performance but often comes at the expense of generalization. Model merging techniques, which integrate multiple fine-tuned models into a single multi-task model through task arithmetic at various levels: model, layer, or parameter, offer a promising solution. However, task interference remains a fundamental challenge, leading to performance degradation and suboptimal merged models. Existing approaches largely overlook the fundamental role of individual neurons and their connectivity, resulting in a lack of interpretability in both the merging process and the merged models. In this work, we present the first study on the impact of neuronal alignment in model merging. We decompose task-specific representations into two complementary neuronal subspaces that regulate neuron sensitivity and input adaptability. Leveraging this decomposition, we introduce NeuroMerging, a novel merging framework developed to mitigate task interference within neuronal subspaces, enabling training-free model fusion across diverse tasks. Through extensive experiments, we demonstrate that NeuroMerging achieves superior performance compared to existing methods on multi-task benchmarks across both vision and natural language domains. Our findings highlight the importance of aligning neuronal mechanisms in model merging, offering new insights into mitigating task interference and improving knowledge fusion.
Related papers
- Single-neuron deep generative model uncovers underlying physics of neuronal activity in Ca imaging data [0.0]
We propose a novel framework for single-neuron representation learning using autoregressive variational autoencoders (AVAEs)
Our approach embeds individual neurons' signals into a reduced-dimensional space without the need for spike inference algorithms.
The AVAE excels over traditional linear methods by generating more informative and discriminative latent representations.
arXiv Detail & Related papers (2025-01-24T16:33:52Z) - Artificial Kuramoto Oscillatory Neurons [65.16453738828672]
It has long been known in both neuroscience and AI that ''binding'' between neurons leads to a form of competitive learning.<n>We introduce Artificial rethinking together with arbitrary connectivity designs such as fully connected convolutional, or attentive mechanisms.<n>We show that this idea provides performance improvements across a wide spectrum of tasks such as unsupervised object discovery, adversarial robustness, uncertainty, and reasoning.
arXiv Detail & Related papers (2024-10-17T17:47:54Z) - BLEND: Behavior-guided Neural Population Dynamics Modeling via Privileged Knowledge Distillation [6.3559178227943764]
We propose BLEND, a behavior-guided neural population dynamics modeling framework via privileged knowledge distillation.<n>By considering behavior as privileged information, we train a teacher model that takes both behavior observations (privileged features) and neural activities (regular features) as inputs.<n>A student model is then distilled using only neural activity.
arXiv Detail & Related papers (2024-10-02T12:45:59Z) - Modularity in Transformers: Investigating Neuron Separability & Specialization [0.0]
Transformer models are increasingly prevalent in various applications, yet our understanding of their internal workings remains limited.
This paper investigates the modularity and task specialization of neurons within transformer architectures, focusing on both vision (ViT) and language (Mistral 7B) models.
Using a combination of selective pruning and MoEfication clustering techniques, we analyze the overlap and specialization of neurons across different tasks and data subsets.
arXiv Detail & Related papers (2024-08-30T14:35:01Z) - Neuroformer: Multimodal and Multitask Generative Pretraining for Brain Data [3.46029409929709]
State-of-the-art systems neuroscience experiments yield large-scale multimodal data, and these data sets require new tools for analysis.
Inspired by the success of large pretrained models in vision and language domains, we reframe the analysis of large-scale, cellular-resolution neuronal spiking data into an autoregressive generation problem.
We first trained Neuroformer on simulated datasets, and found that it both accurately predicted intrinsically simulated neuronal circuit activity, and also inferred the underlying neural circuit connectivity, including direction.
arXiv Detail & Related papers (2023-10-31T20:17:32Z) - Persistent-Transient Duality: A Multi-mechanism Approach for Modeling
Human-Object Interaction [58.67761673662716]
Humans are highly adaptable, swiftly switching between different modes to handle different tasks, situations and contexts.
In Human-object interaction (HOI) activities, these modes can be attributed to two mechanisms: (1) the large-scale consistent plan for the whole activity and (2) the small-scale children interactive actions that start and end along the timeline.
This work proposes to model two concurrent mechanisms that jointly control human motion.
arXiv Detail & Related papers (2023-07-24T12:21:33Z) - A Generic Shared Attention Mechanism for Various Backbone Neural Networks [53.36677373145012]
Self-attention modules (SAMs) produce strongly correlated attention maps across different layers.
Dense-and-Implicit Attention (DIA) shares SAMs across layers and employs a long short-term memory module.
Our simple yet effective DIA can consistently enhance various network backbones.
arXiv Detail & Related papers (2022-10-27T13:24:08Z) - Understanding Neural Coding on Latent Manifolds by Sharing Features and
Dividing Ensembles [3.625425081454343]
Systems neuroscience relies on two complementary views of neural data, characterized by single neuron tuning curves and analysis of population activity.
These two perspectives combine elegantly in neural latent variable models that constrain the relationship between latent variables and neural activity.
We propose feature sharing across neural tuning curves, which significantly improves performance and leads to better-behaved optimization.
arXiv Detail & Related papers (2022-10-06T18:37:49Z) - EINNs: Epidemiologically-Informed Neural Networks [75.34199997857341]
We introduce a new class of physics-informed neural networks-EINN-crafted for epidemic forecasting.
We investigate how to leverage both the theoretical flexibility provided by mechanistic models as well as the data-driven expressability afforded by AI models.
arXiv Detail & Related papers (2022-02-21T18:59:03Z) - Overcoming the Domain Gap in Contrastive Learning of Neural Action
Representations [60.47807856873544]
A fundamental goal in neuroscience is to understand the relationship between neural activity and behavior.
We generated a new multimodal dataset consisting of the spontaneous behaviors generated by fruit flies.
This dataset and our new set of augmentations promise to accelerate the application of self-supervised learning methods in neuroscience.
arXiv Detail & Related papers (2021-11-29T15:27:51Z) - Learning Neural Causal Models with Active Interventions [83.44636110899742]
We introduce an active intervention-targeting mechanism which enables a quick identification of the underlying causal structure of the data-generating process.
Our method significantly reduces the required number of interactions compared with random intervention targeting.
We demonstrate superior performance on multiple benchmarks from simulated to real-world data.
arXiv Detail & Related papers (2021-09-06T13:10:37Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z) - Rethinking Generalization of Neural Models: A Named Entity Recognition
Case Study [81.11161697133095]
We take the NER task as a testbed to analyze the generalization behavior of existing models from different perspectives.
Experiments with in-depth analyses diagnose the bottleneck of existing neural NER models.
As a by-product of this paper, we have open-sourced a project that involves a comprehensive summary of recent NER papers.
arXiv Detail & Related papers (2020-01-12T04:33:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.