Discovering Long-Term Effects on Parameter Efficient Fine-tuning
- URL: http://arxiv.org/abs/2409.06706v1
- Date: Sat, 24 Aug 2024 03:27:29 GMT
- Title: Discovering Long-Term Effects on Parameter Efficient Fine-tuning
- Authors: Gaole Dai, Yiming Tang, Chunkai Fan, Qizhe Zhang, Zhi Zhang, Yulu Gan, Chengqing Zeng, Shanghang Zhang, Tiejun Huang,
- Abstract summary: Pre-trained Artificial Neural Networks (Annns) exhibit robust pattern recognition capabilities.
Annns and BNNs share extensive similarities with the human brain, specifically Biological Neural Networks (BNNs)
Annns can acquire new knowledge through fine-tuning.
- Score: 36.83255498301937
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Pre-trained Artificial Neural Networks (ANNs) exhibit robust pattern recognition capabilities and share extensive similarities with the human brain, specifically Biological Neural Networks (BNNs). We are particularly intrigued by these models' ability to acquire new knowledge through fine-tuning. In this regard, Parameter-efficient Fine-tuning (PEFT) has gained widespread adoption as a substitute for full fine-tuning due to its cost reduction in training and mitigation of over-fitting risks by limiting the number of trainable parameters during adaptation. Since both ANNs and BNNs propagate information layer-by-layer, a common analogy can be drawn: weights in ANNs represent synapses in BNNs, while features (also known as latent variables or logits) in ANNs represent neurotransmitters released by neurons in BNNs. Mainstream PEFT methods aim to adjust feature or parameter values using only a limited number of trainable parameters (usually less than 1% of the total parameters), yet achieve surprisingly good results. Building upon this clue, we delve deeper into exploring the connections between feature adjustment and parameter adjustment, resulting in our proposed method Synapses & Neurons (SAN) that learns scaling matrices for features and propagates their effects towards posterior weight matrices. Our approach draws strong inspiration from well-known neuroscience phenomena - Long-term Potentiation (LTP) and Long-term Depression (LTD), which also reveal the relationship between synapse development and neurotransmitter release levels. We conducted extensive comparisons of PEFT on 26 datasets using attention-based networks as well as convolution-based networks, leading to significant improvements compared to other tuning methods (+8.5% over fully-finetune, +7% over Visual Prompt Tuning, and +3.2% over LoRA). The codes would be released.
Related papers
- TS-LIF: A Temporal Segment Spiking Neuron Network for Time Series Forecasting [27.91825785119938]
Spiking Neural Networks (SNNs) offer a promising, biologically inspired approach for processing data for time series forecasting.
We introduce the Temporal Leaky Segment Integrate-and-Fire model, featuring a dual-compartment architecture.
Experimental results show that TS-LIF outperforms traditional SNNs in time series forecasting.
arXiv Detail & Related papers (2025-03-07T03:06:21Z) - Learning Parameter Sharing with Tensor Decompositions and Sparsity [5.73573685846194]
We introduce Fine-grained Singular Sharing (FiPS) to compress large-scale Vision Transformers (ViTs) and Large Language Models (LLMs)
FiPS employs a shared base and sparse factors to represent neurons across multi-layer perceptron (MLP) modules.
Experimental results show that FiPS reduces the parameter budget by 50-75% for DeiT-B and Swin-L and by 40-50% for various Gemma-2 and Llama-3 models.
arXiv Detail & Related papers (2024-11-14T21:29:58Z) - Dendrites endow artificial neural networks with accurate, robust and parameter-efficient learning [0.0]
We show that a new ANN architecture incorporates the structured connectivity and restricted sampling properties of biological dendrites.
We find that dendritic ANNs are more robust to overfitting and outperform traditional ANNs on several image classification tasks.
arXiv Detail & Related papers (2024-04-04T11:22:58Z) - Let's Focus on Neuron: Neuron-Level Supervised Fine-tuning for Large Language Model [43.107778640669544]
Large Language Models (LLMs) are composed of neurons that exhibit various behaviors and roles.
Recent studies have revealed that not all neurons are active across different datasets.
We introduce Neuron-Level Fine-Tuning (NeFT), a novel approach that refines the granularity of parameter training down to the individual neuron.
arXiv Detail & Related papers (2024-03-18T09:55:01Z) - Fine-Tuning Surrogate Gradient Learning for Optimal Hardware Performance
in Spiking Neural Networks [1.52292571922932]
Spiking Neural Networks (SNNs) can provide tremendous energy efficiency benefits when carefully exploited in hardware.
This work reveals novel insights into the impacts of training on hardware performance.
arXiv Detail & Related papers (2024-02-09T06:38:12Z) - Co-learning synaptic delays, weights and adaptation in spiking neural
networks [0.0]
Spiking neural networks (SNN) distinguish themselves from artificial neural networks (ANN) because of their inherent temporal processing and spike-based computations.
We show that data processing with spiking neurons can be enhanced by co-learning the connection weights with two other biologically inspired neuronal features.
arXiv Detail & Related papers (2023-09-12T09:13:26Z) - The Expressive Leaky Memory Neuron: an Efficient and Expressive Phenomenological Neuron Model Can Solve Long-Horizon Tasks [64.08042492426992]
We introduce the Expressive Memory (ELM) neuron model, a biologically inspired model of a cortical neuron.
Our ELM neuron can accurately match the aforementioned input-output relationship with under ten thousand trainable parameters.
We evaluate it on various tasks with demanding temporal structures, including the Long Range Arena (LRA) datasets.
arXiv Detail & Related papers (2023-06-14T13:34:13Z) - MSAT: Biologically Inspired Multi-Stage Adaptive Threshold for
Conversion of Spiking Neural Networks [11.392893261073594]
Spiking Neural Networks (SNNs) can do inference with low power consumption due to their spike sparsity.
ANN-SNN conversion is an efficient way to achieve deep SNNs by converting well-trained Artificial Neural Networks (ANNs)
Existing methods commonly use constant threshold for conversion, which prevents neurons from rapidly delivering spikes to deeper layers.
arXiv Detail & Related papers (2023-03-23T07:18:08Z) - Bayesian Neural Network Language Modeling for Speech Recognition [59.681758762712754]
State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex.
In this paper, an overarching full Bayesian learning framework is proposed to account for the underlying uncertainty in LSTM-RNN and Transformer LMs.
arXiv Detail & Related papers (2022-08-28T17:50:19Z) - Knowledge Enhanced Neural Networks for relational domains [83.9217787335878]
We focus on a specific method, KENN, a Neural-Symbolic architecture that injects prior logical knowledge into a neural network.
In this paper, we propose an extension of KENN for relational data.
arXiv Detail & Related papers (2022-05-31T13:00:34Z) - Training High-Performance Low-Latency Spiking Neural Networks by
Differentiation on Spike Representation [70.75043144299168]
Spiking Neural Network (SNN) is a promising energy-efficient AI model when implemented on neuromorphic hardware.
It is a challenge to efficiently train SNNs due to their non-differentiability.
We propose the Differentiation on Spike Representation (DSR) method, which could achieve high performance.
arXiv Detail & Related papers (2022-05-01T12:44:49Z) - MoEfication: Conditional Computation of Transformer Models for Efficient
Inference [66.56994436947441]
Transformer-based pre-trained language models can achieve superior performance on most NLP tasks due to large parameter capacity, but also lead to huge computation cost.
We explore to accelerate large-model inference by conditional computation based on the sparse activation phenomenon.
We propose to transform a large model into its mixture-of-experts (MoE) version with equal model size, namely MoEfication.
arXiv Detail & Related papers (2021-10-05T02:14:38Z) - Training Feedback Spiking Neural Networks by Implicit Differentiation on
the Equilibrium State [66.2457134675891]
Spiking neural networks (SNNs) are brain-inspired models that enable energy-efficient implementation on neuromorphic hardware.
Most existing methods imitate the backpropagation framework and feedforward architectures for artificial neural networks.
We propose a novel training method that does not rely on the exact reverse of the forward computation.
arXiv Detail & Related papers (2021-09-29T07:46:54Z) - BackEISNN: A Deep Spiking Neural Network with Adaptive Self-Feedback and
Balanced Excitatory-Inhibitory Neurons [8.956708722109415]
Spiking neural networks (SNNs) transmit information through discrete spikes, which performs well in processing spatial-temporal information.
We propose a deep spiking neural network with adaptive self-feedback and balanced excitatory and inhibitory neurons (BackEISNN)
For the MNIST, FashionMNIST, and N-MNIST datasets, our model has achieved state-of-the-art performance.
arXiv Detail & Related papers (2021-05-27T08:38:31Z) - Skip-Connected Self-Recurrent Spiking Neural Networks with Joint
Intrinsic Parameter and Synaptic Weight Training [14.992756670960008]
We propose a new type of RSNN called Skip-Connected Self-Recurrent SNNs (ScSr-SNNs)
ScSr-SNNs can boost performance by up to 2.55% compared with other types of RSNNs trained by state-of-the-art BP methods.
arXiv Detail & Related papers (2020-10-23T22:27:13Z) - Progressive Tandem Learning for Pattern Recognition with Deep Spiking
Neural Networks [80.15411508088522]
Spiking neural networks (SNNs) have shown advantages over traditional artificial neural networks (ANNs) for low latency and high computational efficiency.
We propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition.
arXiv Detail & Related papers (2020-07-02T15:38:44Z) - Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner
Party Transcription [73.66530509749305]
In this paper, we argue that, even in difficult cases, some end-to-end approaches show performance close to the hybrid baseline.
We experimentally compare and analyze CTC-Attention versus RNN-Transducer approaches along with RNN versus Transformer architectures.
Our best end-to-end model based on RNN-Transducer, together with improved beam search, reaches quality by only 3.8% WER abs. worse than the LF-MMI TDNN-F CHiME-6 Challenge baseline.
arXiv Detail & Related papers (2020-04-22T19:08:33Z) - Flexible Transmitter Network [84.90891046882213]
Current neural networks are mostly built upon the MP model, which usually formulates the neuron as executing an activation function on the real-valued weighted aggregation of signals received from other neurons.
We propose the Flexible Transmitter (FT) model, a novel bio-plausible neuron model with flexible synaptic plasticity.
We present the Flexible Transmitter Network (FTNet), which is built on the most common fully-connected feed-forward architecture.
arXiv Detail & Related papers (2020-04-08T06:55:12Z) - Rectified Linear Postsynaptic Potential Function for Backpropagation in
Deep Spiking Neural Networks [55.0627904986664]
Spiking Neural Networks (SNNs) usetemporal spike patterns to represent and transmit information, which is not only biologically realistic but also suitable for ultra-low-power event-driven neuromorphic implementation.
This paper investigates the contribution of spike timing dynamics to information encoding, synaptic plasticity and decision making, providing a new perspective to design of future DeepSNNs and neuromorphic hardware systems.
arXiv Detail & Related papers (2020-03-26T11:13:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.