Towards Understanding Multi-Task Learning (Generalization) of LLMs via Detecting and Exploring Task-Specific Neurons
- URL: http://arxiv.org/abs/2407.06488v1
- Date: Tue, 9 Jul 2024 01:27:35 GMT
- Title: Towards Understanding Multi-Task Learning (Generalization) of LLMs via Detecting and Exploring Task-Specific Neurons
- Authors: Yongqi Leng, Deyi Xiong,
- Abstract summary: We detect task-sensitive neurons in large language models (LLMs) via gradient attribution on task-specific data.
We find that the overlap of task-specific neurons is strongly associated with generalization and specialization across tasks.
We propose a neuron-level continuous fine-tuning method that only fine-tunes the current task-specific neurons during continuous learning.
- Score: 45.04661608619081
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While large language models (LLMs) have demonstrated superior multi-task capabilities, understanding the learning mechanisms behind this is still a challenging problem. In this paper, we attempt to understand such mechanisms from the perspective of neurons. Specifically, we detect task-sensitive neurons in LLMs via gradient attribution on task-specific data. Through extensive deactivation and fine-tuning experiments, we demonstrate that the detected neurons are highly correlated with the given task, which we term as task-specific neurons. With these identified task-specific neurons, we delve into two common problems in multi-task learning and continuous learning: Generalization and Catastrophic Forgetting. We find that the overlap of task-specific neurons is strongly associated with generalization and specialization across tasks. Interestingly, at certain layers of LLMs, there is a high similarity in the parameters of different task-specific neurons, and such similarity is highly correlated with the generalization performance. Inspired by these findings, we propose a neuron-level continuous fine-tuning method that only fine-tunes the current task-specific neurons during continuous learning, and extensive experiments demonstrate the effectiveness of the proposed method. Our study provides insights into the interpretability of LLMs in multi-task learning.
Related papers
- Synergistic pathways of modulation enable robust task packing within neural dynamics [0.0]
We use recurrent network models to probe the distinctions between two forms of contextual modulation of neural dynamics.
We demonstrate distinction between these mechanisms at the level of the neuronal dynamics they induce.
These characterizations indicate complementarity and synergy in how these mechanisms act, potentially over multiple time-scales.
arXiv Detail & Related papers (2024-08-02T15:12:01Z) - Enhancing learning in spiking neural networks through neuronal heterogeneity and neuromodulatory signaling [52.06722364186432]
We propose a biologically-informed framework for enhancing artificial neural networks (ANNs)
Our proposed dual-framework approach highlights the potential of spiking neural networks (SNNs) for emulating diverse spiking behaviors.
We outline how the proposed approach integrates brain-inspired compartmental models and task-driven SNNs, bioinspiration and complexity.
arXiv Detail & Related papers (2024-07-05T14:11:28Z) - Sharing Matters: Analysing Neurons Across Languages and Tasks in LLMs [70.3132264719438]
We aim to fill the research gap by examining how neuron activation is shared across tasks and languages.
We classify neurons into four distinct categories based on their responses to a specific input across different languages.
Our analysis reveals the following insights: (i) the patterns of neuron sharing are significantly affected by the characteristics of tasks and examples; (ii) neuron sharing does not fully correspond with language similarity; (iii) shared neurons play a vital role in generating responses, especially those shared across all languages.
arXiv Detail & Related papers (2024-06-13T16:04:11Z) - Sparse Multitask Learning for Efficient Neural Representation of Motor
Imagery and Execution [30.186917337606477]
We introduce a sparse multitask learning framework for motor imagery (MI) and motor execution (ME) tasks.
Given a dual-task CNN model for MI-ME classification, we apply a saliency-based sparsification approach to prune superfluous connections.
Our results indicate that this tailored sparsity can mitigate the overfitting problem and improve the test performance with small amount of data.
arXiv Detail & Related papers (2023-12-10T09:06:16Z) - Randomly Weighted Neuromodulation in Neural Networks Facilitates
Learning of Manifolds Common Across Tasks [1.9580473532948401]
Geometric Sensitive Hashing functions are neural network models that learn class-specific manifold geometry in supervised learning.
We show that a randomly weighted neural network with a neuromodulation system can realize this function.
arXiv Detail & Related papers (2023-11-17T15:22:59Z) - Synergistic information supports modality integration and flexible
learning in neural networks solving multiple tasks [107.8565143456161]
We investigate the information processing strategies adopted by simple artificial neural networks performing a variety of cognitive tasks.
Results show that synergy increases as neural networks learn multiple diverse tasks.
randomly turning off neurons during training through dropout increases network redundancy, corresponding to an increase in robustness.
arXiv Detail & Related papers (2022-10-06T15:36:27Z) - Multi-Task Neural Processes [105.22406384964144]
We develop multi-task neural processes, a new variant of neural processes for multi-task learning.
In particular, we propose to explore transferable knowledge from related tasks in the function space to provide inductive bias for improving each individual task.
Results demonstrate the effectiveness of multi-task neural processes in transferring useful knowledge among tasks for multi-task learning.
arXiv Detail & Related papers (2021-11-10T17:27:46Z) - On the relationship between disentanglement and multi-task learning [62.997667081978825]
We take a closer look at the relationship between disentanglement and multi-task learning based on hard parameter sharing.
We show that disentanglement appears naturally during the process of multi-task neural network training.
arXiv Detail & Related papers (2021-10-07T14:35:34Z) - Efficient and robust multi-task learning in the brain with modular task
primitives [2.6166087473624318]
We show that a modular network endowed with task primitives allows for learning multiple tasks well while keeping parameter counts, and updates, low.
We also show that the skills acquired with our approach are more robust to a broad range of perturbations compared to those acquired with other multi-task learning strategies.
arXiv Detail & Related papers (2021-05-28T21:07:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.