Towards Understanding Multi-Task Learning (Generalization) of LLMs via Detecting and Exploring Task-Specific Neurons
- URL: http://arxiv.org/abs/2407.06488v1
- Date: Tue, 9 Jul 2024 01:27:35 GMT
- Title: Towards Understanding Multi-Task Learning (Generalization) of LLMs via Detecting and Exploring Task-Specific Neurons
- Authors: Yongqi Leng, Deyi Xiong,
- Abstract summary: We detect task-sensitive neurons in large language models (LLMs) via gradient attribution on task-specific data.
We find that the overlap of task-specific neurons is strongly associated with generalization and specialization across tasks.
We propose a neuron-level continuous fine-tuning method that only fine-tunes the current task-specific neurons during continuous learning.
- Score: 45.04661608619081
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While large language models (LLMs) have demonstrated superior multi-task capabilities, understanding the learning mechanisms behind this is still a challenging problem. In this paper, we attempt to understand such mechanisms from the perspective of neurons. Specifically, we detect task-sensitive neurons in LLMs via gradient attribution on task-specific data. Through extensive deactivation and fine-tuning experiments, we demonstrate that the detected neurons are highly correlated with the given task, which we term as task-specific neurons. With these identified task-specific neurons, we delve into two common problems in multi-task learning and continuous learning: Generalization and Catastrophic Forgetting. We find that the overlap of task-specific neurons is strongly associated with generalization and specialization across tasks. Interestingly, at certain layers of LLMs, there is a high similarity in the parameters of different task-specific neurons, and such similarity is highly correlated with the generalization performance. Inspired by these findings, we propose a neuron-level continuous fine-tuning method that only fine-tunes the current task-specific neurons during continuous learning, and extensive experiments demonstrate the effectiveness of the proposed method. Our study provides insights into the interpretability of LLMs in multi-task learning.
Related papers
- Sharing Matters: Analysing Neurons Across Languages and Tasks in LLMs [73.0661307151716]
We investigate how neuron activation is shared across languages by categorizing neurons into four distinct groups according to their responses across different languages for a particular input: all-shared, partial-shared, specific, and non-activated.
Our analysis reveals the following insights: (i) the linguistic sharing patterns are strongly affected by the type of task, but neuron behaviour changes across different inputs even for the same task; (ii) all-shared neurons play a key role in generating correct responses; (iii) boosting multilingual alignment by increasing all-shared neurons can enhance accuracy on multilingual tasks.
arXiv Detail & Related papers (2024-06-13T16:04:11Z) - Effectiveness Assessment of Recent Large Vision-Language Models [78.69439393646554]
This paper endeavors to evaluate the competency of popular large vision-language models (LVLMs) in specialized and general tasks.
We employ six challenging tasks in three different application scenarios: natural, healthcare, and industrial.
We examine the performance of three recent open-source LVLMs, including MiniGPT-v2, LLaVA-1.5, and Shikra, on both visual recognition and localization in these tasks.
arXiv Detail & Related papers (2024-03-07T08:25:27Z) - Sparse Multitask Learning for Efficient Neural Representation of Motor
Imagery and Execution [30.186917337606477]
We introduce a sparse multitask learning framework for motor imagery (MI) and motor execution (ME) tasks.
Given a dual-task CNN model for MI-ME classification, we apply a saliency-based sparsification approach to prune superfluous connections.
Our results indicate that this tailored sparsity can mitigate the overfitting problem and improve the test performance with small amount of data.
arXiv Detail & Related papers (2023-12-10T09:06:16Z) - Randomly Weighted Neuromodulation in Neural Networks Facilitates
Learning of Manifolds Common Across Tasks [1.9580473532948401]
Geometric Sensitive Hashing functions are neural network models that learn class-specific manifold geometry in supervised learning.
We show that a randomly weighted neural network with a neuromodulation system can realize this function.
arXiv Detail & Related papers (2023-11-17T15:22:59Z) - Learning low-dimensional dynamics from whole-brain data improves task
capture [2.82277518679026]
We introduce a novel approach to learning low-dimensional approximations of neural dynamics by using a sequential variational autoencoder (SVAE)
Our method finds smooth dynamics that can predict cognitive processes with accuracy higher than classical methods.
We evaluate our approach on various task-fMRI datasets, including motor, working memory, and relational processing tasks.
arXiv Detail & Related papers (2023-05-18T18:43:13Z) - Synergistic information supports modality integration and flexible
learning in neural networks solving multiple tasks [107.8565143456161]
We investigate the information processing strategies adopted by simple artificial neural networks performing a variety of cognitive tasks.
Results show that synergy increases as neural networks learn multiple diverse tasks.
randomly turning off neurons during training through dropout increases network redundancy, corresponding to an increase in robustness.
arXiv Detail & Related papers (2022-10-06T15:36:27Z) - Multi-Task Neural Processes [105.22406384964144]
We develop multi-task neural processes, a new variant of neural processes for multi-task learning.
In particular, we propose to explore transferable knowledge from related tasks in the function space to provide inductive bias for improving each individual task.
Results demonstrate the effectiveness of multi-task neural processes in transferring useful knowledge among tasks for multi-task learning.
arXiv Detail & Related papers (2021-11-10T17:27:46Z) - On the relationship between disentanglement and multi-task learning [62.997667081978825]
We take a closer look at the relationship between disentanglement and multi-task learning based on hard parameter sharing.
We show that disentanglement appears naturally during the process of multi-task neural network training.
arXiv Detail & Related papers (2021-10-07T14:35:34Z) - Efficient and robust multi-task learning in the brain with modular task
primitives [2.6166087473624318]
We show that a modular network endowed with task primitives allows for learning multiple tasks well while keeping parameter counts, and updates, low.
We also show that the skills acquired with our approach are more robust to a broad range of perturbations compared to those acquired with other multi-task learning strategies.
arXiv Detail & Related papers (2021-05-28T21:07:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.