Finding Skill Neurons in Pre-trained Transformer-based Language Models
- URL: http://arxiv.org/abs/2211.07349v1
- Date: Mon, 14 Nov 2022 13:43:46 GMT
- Title: Finding Skill Neurons in Pre-trained Transformer-based Language Models
- Authors: Xiaozhi Wang, Kaiyue Wen, Zhengyan Zhang, Lei Hou, Zhiyuan Liu, Juanzi
Li
- Abstract summary: Transformer-based pre-trained language models have demonstrated superior performance on various natural language processing tasks.
We find that after prompt tuning for specific tasks, the activations of some neurons within pre-trained Transformers are highly predictive of the task labels.
We also explore the applications of skill neurons, including accelerating Transformers with network pruning and building better transferability indicators.
- Score: 46.484656229427834
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer-based pre-trained language models have demonstrated superior
performance on various natural language processing tasks. However, it remains
unclear how the skills required to handle these tasks distribute among model
parameters. In this paper, we find that after prompt tuning for specific tasks,
the activations of some neurons within pre-trained Transformers are highly
predictive of the task labels. We dub these neurons skill neurons and confirm
they encode task-specific skills by finding that: (1) Skill neurons are crucial
for handling tasks. Performances of pre-trained Transformers on a task
significantly drop when corresponding skill neurons are perturbed. (2) Skill
neurons are task-specific. Similar tasks tend to have similar distributions of
skill neurons. Furthermore, we demonstrate the skill neurons are most likely
generated in pre-training rather than fine-tuning by showing that the skill
neurons found with prompt tuning are also crucial for other fine-tuning methods
freezing neuron weights, such as the adapter-based tuning and BitFit. We also
explore the applications of skill neurons, including accelerating Transformers
with network pruning and building better transferability indicators. These
findings may promote further research on understanding Transformers. The source
code can be obtained from https://github.com/THU-KEG/Skill-Neuron.
Related papers
- Verified Neural Compressed Sensing [58.98637799432153]
We develop the first (to the best of our knowledge) provably correct neural networks for a precise computational task.
We show that for modest problem dimensions (up to 50), we can train neural networks that provably recover a sparse vector from linear and binarized linear measurements.
We show that the complexity of the network can be adapted to the problem difficulty and solve problems where traditional compressed sensing methods are not known to provably work.
arXiv Detail & Related papers (2024-05-07T12:20:12Z) - No One-Size-Fits-All Neurons: Task-based Neurons for Artificial Neural Networks [25.30801109401654]
Since the human brain is a task-based neuron user, can the artificial network design go from the task-based architecture design to the task-based neuron design?
We propose a two-step framework for prototyping task-based neurons.
Experiments show that the proposed task-based neuron design is not only feasible but also delivers competitive performance over other state-of-the-art models.
arXiv Detail & Related papers (2024-05-03T09:12:46Z) - Hebbian Learning based Orthogonal Projection for Continual Learning of
Spiking Neural Networks [74.3099028063756]
We develop a new method with neuronal operations based on lateral connections and Hebbian learning.
We show that Hebbian and anti-Hebbian learning on recurrent lateral connections can effectively extract the principal subspace of neural activities.
Our method consistently solves for spiking neural networks with nearly zero forgetting.
arXiv Detail & Related papers (2024-02-19T09:29:37Z) - Neuron to Graph: Interpreting Language Model Neurons at Scale [8.32093320910416]
This paper introduces a novel automated approach designed to scale interpretability techniques across a vast array of neurons within Large Language Models.
We propose Neuron to Graph (N2G), an innovative tool that automatically extracts a neuron's behaviour from the dataset it was trained on and translates it into an interpretable graph.
arXiv Detail & Related papers (2023-05-31T14:44:33Z) - Learning to Act through Evolution of Neural Diversity in Random Neural
Networks [9.387749254963595]
In most artificial neural networks (ANNs), neural computation is abstracted to an activation function that is usually shared between all neurons.
We propose the optimization of neuro-centric parameters to attain a set of diverse neurons that can perform complex computations.
arXiv Detail & Related papers (2023-05-25T11:33:04Z) - Redundancy and Concept Analysis for Code-trained Language Models [5.726842555987591]
Code-trained language models have proven to be highly effective for various code intelligence tasks.
They can be challenging to train and deploy for many software engineering applications due to computational bottlenecks and memory constraints.
We perform the first neuron-level analysis for source code models to identify textitimportant neurons within latent representations.
arXiv Detail & Related papers (2023-05-01T15:22:41Z) - Complex Dynamic Neurons Improved Spiking Transformer Network for
Efficient Automatic Speech Recognition [8.998797644039064]
The spiking neural network (SNN) using leaky-integrated-and-fire (LIF) neurons has been commonly used in automatic speech recognition (ASR) tasks.
Here we introduce four types of neuronal dynamics to post-process the sequential patterns generated from the spiking transformer.
We found that the DyTr-SNN could handle the non-toy automatic speech recognition task well, representing a lower phoneme error rate, lower computational cost, and higher robustness.
arXiv Detail & Related papers (2023-02-02T16:20:27Z) - Multi-Task Neural Processes [105.22406384964144]
We develop multi-task neural processes, a new variant of neural processes for multi-task learning.
In particular, we propose to explore transferable knowledge from related tasks in the function space to provide inductive bias for improving each individual task.
Results demonstrate the effectiveness of multi-task neural processes in transferring useful knowledge among tasks for multi-task learning.
arXiv Detail & Related papers (2021-11-10T17:27:46Z) - Towards Efficient Processing and Learning with Spikes: New Approaches
for Multi-Spike Learning [59.249322621035056]
We propose two new multi-spike learning rules which demonstrate better performance over other baselines on various tasks.
In the feature detection task, we re-examine the ability of unsupervised STDP with its limitations being presented.
Our proposed learning rules can reliably solve the task over a wide range of conditions without specific constraints being applied.
arXiv Detail & Related papers (2020-05-02T06:41:20Z) - Non-linear Neurons with Human-like Apical Dendrite Activations [81.18416067005538]
We show that a standard neuron followed by our novel apical dendrite activation (ADA) can learn the XOR logical function with 100% accuracy.
We conduct experiments on six benchmark data sets from computer vision, signal processing and natural language processing.
arXiv Detail & Related papers (2020-02-02T21:09:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.