SAN: Hypothesizing Long-Term Synaptic Development and Neural Engram Mechanism in Scalable Model's Parameter-Efficient Fine-Tuning
- URL: http://arxiv.org/abs/2409.06706v3
- Date: Wed, 26 Feb 2025 07:07:05 GMT
- Title: SAN: Hypothesizing Long-Term Synaptic Development and Neural Engram Mechanism in Scalable Model's Parameter-Efficient Fine-Tuning
- Authors: Gaole Dai, Chun-Kai Fan, Yiming Tang, Zhi Zhang, Yuan Zhang, Yulu Gan, Qizhe Zhang, Cheng-Ching Tseng, Shanghang Zhang, Tiejun Huang,
- Abstract summary: We bridged the performance gap with Full Fine-Tuning (FFT) through sophisticated analysis of pre-trained parameter spaces.<n>We propose a method, Synapse and Neuron (SAN), which decomposes scaling components from anterior to posterior weight matrices.<n>Our approach is theoretically grounded in Long-Term Potentiation/Depression ( Neural/D) phenomena, which govern synapse development through neurotransmitter release.
- Score: 39.04674956382538
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Advances in Parameter-Efficient Fine-Tuning (PEFT) bridged the performance gap with Full Fine-Tuning (FFT) through sophisticated analysis of pre-trained parameter spaces. Starting from drawing insights from Neural Engrams (NE) in Biological Neural Networks (BNNs), we establish a connection between the low-rank property observed during PEFT's parameter space shifting and neurobiological mechanisms. This observation leads to our proposed method, Synapse and Neuron (SAN), which decomposes and propagates scaling components from anterior feature adjusting vectors towards posterior weight matrices. Our approach is theoretically grounded in Long-Term Potentiation/Depression (LTP/D) phenomena, which govern synapse development through neurotransmitter release modulation. Extensive experiments demonstrate its effectiveness: on \textbf{vision tasks} across VTAB, FGVC, and GIC (25 datasets) using ViT, SwinT and ConvNeXt, SAN outperforms FFT up to 8.7% and LoRA by 3.2%; on language tasks using Commonsense Reasoning (8 datasets) with LLaMA models (all generations), surpassing ChatGPT up to 8.5% and LoRA by 4.7%; on visual-language tasks using Mixed Visual Instruction (7 datasets) with LLaVA models, it exceeds FFT up to 2.4% and LoRA by 1.9%. Our code and W&B log will be released.
Related papers
- Function-Space Decoupled Diffusion for Forward and Inverse Modeling in Carbon Capture and Storage [65.51149575007149]
We present Fun-DDPS, a generative framework that combines function-space diffusion models with differentiable neural operator surrogates for both forward and inverse modeling.<n>Fun-DDPS produces physically consistent realizations free from the high-frequency artifacts observed in joint-state baselines.
arXiv Detail & Related papers (2026-02-12T18:58:12Z) - Dynamic Meta-Ensemble Framework for Efficient and Accurate Deep Learning in Plant Leaf Disease Detection on Resource-Constrained Edge Devices [0.0]
We introduce a novel Dynamic Meta-Enemble Framework (DMEF) for high-accuracy plant disease diagnosis under resource constraints.<n>DMEF employs an adaptive weighting mechanism that dynamically combines the predictions of three lightweight convolutional neural networks.<n>Experiments on benchmark datasets for potato and maize diseases demonstrate state-of-the-art classification accuracies of 99.53% and 96.61%, respectively.
arXiv Detail & Related papers (2026-01-24T03:57:49Z) - Transformers vs. Recurrent Models for Estimating Forest Gross Primary Production [32.81344785160551]
Remote sensing offers an alternative to single-sensor spectral indices and statistical models.<n>Recent advances in deep learning (DL) and data fusion offer new opportunities to better represent the temporal dynamics of vegetation processes.<n>Here, we explore the performance of two representative models for predicting Gross Primary Production.
arXiv Detail & Related papers (2025-11-14T21:18:01Z) - Langevin Flows for Modeling Neural Latent Dynamics [81.81271685018284]
We introduce LangevinFlow, a sequential Variational Auto-Encoder where the time evolution of latent variables is governed by the underdamped Langevin equation.<n>Our approach incorporates physical priors -- such as inertia, damping, a learned potential function, and forces -- to represent both autonomous and non-autonomous processes in neural systems.<n>Our method outperforms state-of-the-art baselines on synthetic neural populations generated by a Lorenz attractor.
arXiv Detail & Related papers (2025-07-15T17:57:48Z) - AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy [48.30596996677882]
We investigate the synergy between supervised fine-tuning (SFT) and reinforcement learning (RL) in developing strong reasoning models.<n> scaling strategies yield notable improvements in reasoning performance.<n>Our AceReason-Nemotron-1.1 7B model significantly outperforms AceReason-Nemotron-1.0 and new state-of-the-art performance among Qwen2.5-7B-based reasoning models.
arXiv Detail & Related papers (2025-06-16T09:27:48Z) - AbsoluteNet: A Deep Learning Neural Network to Classify Cerebral Hemodynamic Responses of Auditory Processing [7.243563999211656]
This work introduces AbsoluteNet, a novel deep learning architecture designed to classify auditory event-related responses using fNIRS.<n>The network is built upon principles of convolution and customized activation functions.<n>Results showed that AbsoluteNet outperforms existing models, reaching 87.0% accuracy, 84.8% sensitivity, and 89.2% specificity in binary classification.
arXiv Detail & Related papers (2025-05-27T19:21:17Z) - TS-LIF: A Temporal Segment Spiking Neuron Network for Time Series Forecasting [27.91825785119938]
Spiking Neural Networks (SNNs) offer a promising, biologically inspired approach for processing data for time series forecasting.
We introduce the Temporal Leaky Segment Integrate-and-Fire model, featuring a dual-compartment architecture.
Experimental results show that TS-LIF outperforms traditional SNNs in time series forecasting.
arXiv Detail & Related papers (2025-03-07T03:06:21Z) - Learning Parameter Sharing with Tensor Decompositions and Sparsity [5.73573685846194]
We introduce Fine-grained Singular Sharing (FiPS) to compress large-scale Vision Transformers (ViTs) and Large Language Models (LLMs)
FiPS employs a shared base and sparse factors to represent neurons across multi-layer perceptron (MLP) modules.
Experimental results show that FiPS reduces the parameter budget by 50-75% for DeiT-B and Swin-L and by 40-50% for various Gemma-2 and Llama-3 models.
arXiv Detail & Related papers (2024-11-14T21:29:58Z) - Dendrites endow artificial neural networks with accurate, robust and parameter-efficient learning [0.0]
We show that a new ANN architecture incorporates the structured connectivity and restricted sampling properties of biological dendrites.
We find that dendritic ANNs are more robust to overfitting and outperform traditional ANNs on several image classification tasks.
arXiv Detail & Related papers (2024-04-04T11:22:58Z) - Let's Focus on Neuron: Neuron-Level Supervised Fine-tuning for Large Language Model [43.107778640669544]
Large Language Models (LLMs) are composed of neurons that exhibit various behaviors and roles.
Recent studies have revealed that not all neurons are active across different datasets.
We introduce Neuron-Level Fine-Tuning (NeFT), a novel approach that refines the granularity of parameter training down to the individual neuron.
arXiv Detail & Related papers (2024-03-18T09:55:01Z) - Fine-Tuning Surrogate Gradient Learning for Optimal Hardware Performance
in Spiking Neural Networks [1.52292571922932]
Spiking Neural Networks (SNNs) can provide tremendous energy efficiency benefits when carefully exploited in hardware.
This work reveals novel insights into the impacts of training on hardware performance.
arXiv Detail & Related papers (2024-02-09T06:38:12Z) - Co-learning synaptic delays, weights and adaptation in spiking neural
networks [0.0]
Spiking neural networks (SNN) distinguish themselves from artificial neural networks (ANN) because of their inherent temporal processing and spike-based computations.
We show that data processing with spiking neurons can be enhanced by co-learning the connection weights with two other biologically inspired neuronal features.
arXiv Detail & Related papers (2023-09-12T09:13:26Z) - The Expressive Leaky Memory Neuron: an Efficient and Expressive Phenomenological Neuron Model Can Solve Long-Horizon Tasks [64.08042492426992]
We introduce the Expressive Memory (ELM) neuron model, a biologically inspired model of a cortical neuron.
Our ELM neuron can accurately match the aforementioned input-output relationship with under ten thousand trainable parameters.
We evaluate it on various tasks with demanding temporal structures, including the Long Range Arena (LRA) datasets.
arXiv Detail & Related papers (2023-06-14T13:34:13Z) - MSAT: Biologically Inspired Multi-Stage Adaptive Threshold for
Conversion of Spiking Neural Networks [11.392893261073594]
Spiking Neural Networks (SNNs) can do inference with low power consumption due to their spike sparsity.
ANN-SNN conversion is an efficient way to achieve deep SNNs by converting well-trained Artificial Neural Networks (ANNs)
Existing methods commonly use constant threshold for conversion, which prevents neurons from rapidly delivering spikes to deeper layers.
arXiv Detail & Related papers (2023-03-23T07:18:08Z) - Bayesian Neural Network Language Modeling for Speech Recognition [59.681758762712754]
State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex.
In this paper, an overarching full Bayesian learning framework is proposed to account for the underlying uncertainty in LSTM-RNN and Transformer LMs.
arXiv Detail & Related papers (2022-08-28T17:50:19Z) - Knowledge Enhanced Neural Networks for relational domains [83.9217787335878]
We focus on a specific method, KENN, a Neural-Symbolic architecture that injects prior logical knowledge into a neural network.
In this paper, we propose an extension of KENN for relational data.
arXiv Detail & Related papers (2022-05-31T13:00:34Z) - Training High-Performance Low-Latency Spiking Neural Networks by
Differentiation on Spike Representation [70.75043144299168]
Spiking Neural Network (SNN) is a promising energy-efficient AI model when implemented on neuromorphic hardware.
It is a challenge to efficiently train SNNs due to their non-differentiability.
We propose the Differentiation on Spike Representation (DSR) method, which could achieve high performance.
arXiv Detail & Related papers (2022-05-01T12:44:49Z) - MoEfication: Conditional Computation of Transformer Models for Efficient
Inference [66.56994436947441]
Transformer-based pre-trained language models can achieve superior performance on most NLP tasks due to large parameter capacity, but also lead to huge computation cost.
We explore to accelerate large-model inference by conditional computation based on the sparse activation phenomenon.
We propose to transform a large model into its mixture-of-experts (MoE) version with equal model size, namely MoEfication.
arXiv Detail & Related papers (2021-10-05T02:14:38Z) - Training Feedback Spiking Neural Networks by Implicit Differentiation on
the Equilibrium State [66.2457134675891]
Spiking neural networks (SNNs) are brain-inspired models that enable energy-efficient implementation on neuromorphic hardware.
Most existing methods imitate the backpropagation framework and feedforward architectures for artificial neural networks.
We propose a novel training method that does not rely on the exact reverse of the forward computation.
arXiv Detail & Related papers (2021-09-29T07:46:54Z) - BackEISNN: A Deep Spiking Neural Network with Adaptive Self-Feedback and
Balanced Excitatory-Inhibitory Neurons [8.956708722109415]
Spiking neural networks (SNNs) transmit information through discrete spikes, which performs well in processing spatial-temporal information.
We propose a deep spiking neural network with adaptive self-feedback and balanced excitatory and inhibitory neurons (BackEISNN)
For the MNIST, FashionMNIST, and N-MNIST datasets, our model has achieved state-of-the-art performance.
arXiv Detail & Related papers (2021-05-27T08:38:31Z) - Skip-Connected Self-Recurrent Spiking Neural Networks with Joint
Intrinsic Parameter and Synaptic Weight Training [14.992756670960008]
We propose a new type of RSNN called Skip-Connected Self-Recurrent SNNs (ScSr-SNNs)
ScSr-SNNs can boost performance by up to 2.55% compared with other types of RSNNs trained by state-of-the-art BP methods.
arXiv Detail & Related papers (2020-10-23T22:27:13Z) - Progressive Tandem Learning for Pattern Recognition with Deep Spiking
Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency.
We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z) - Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner
Party Transcription [73.66530509749305]
In this paper, we argue that, even in difficult cases, some end-to-end approaches show performance close to the hybrid baseline.
We experimentally compare and analyze CTC-Attention versus RNN-Transducer approaches along with RNN versus Transformer architectures.
Our best end-to-end model based on RNN-Transducer, together with improved beam search, reaches quality by only 3.8% WER abs. worse than the LF-MMI TDNN-F CHiME-6 Challenge baseline.
arXiv Detail & Related papers (2020-04-22T19:08:33Z) - Flexible Transmitter Network [84.90891046882213]
Current neural networks are mostly built upon the MP model, which usually formulates the neuron as executing an activation function on the real-valued weighted aggregation of signals received from other neurons.
We propose the Flexible Transmitter (FT) model, a novel bio-plausible neuron model with flexible synaptic plasticity.
We present the Flexible Transmitter Network (FTNet), which is built on the most common fully-connected feed-forward architecture.
arXiv Detail & Related papers (2020-04-08T06:55:12Z) - Rectified Linear Postsynaptic Potential Function for Backpropagation in
Deep Spiking Neural Networks [55.0627904986664]
Spiking Neural Networks (SNNs) usetemporal spike patterns to represent and transmit information, which is not only biologically realistic but also suitable for ultra-low-power event-driven neuromorphic implementation.
This paper investigates the contribution of spike timing dynamics to information encoding, synaptic plasticity and decision making, providing a new perspective to design of future DeepSNNs and neuromorphic hardware systems.
arXiv Detail & Related papers (2020-03-26T11:13:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.