Sharing Matters: Analysing Neurons Across Languages and Tasks in LLMs
- URL: http://arxiv.org/abs/2406.09265v2
- Date: Thu, 10 Oct 2024 14:17:20 GMT
- Title: Sharing Matters: Analysing Neurons Across Languages and Tasks in LLMs
- Authors: Weixuan Wang, Barry Haddow, Minghao Wu, Wei Peng, Alexandra Birch,
- Abstract summary: We aim to fill the research gap by examining how neuron activation is shared across tasks and languages.
We classify neurons into four distinct categories based on their responses to a specific input across different languages.
Our analysis reveals the following insights: (i) the patterns of neuron sharing are significantly affected by the characteristics of tasks and examples; (ii) neuron sharing does not fully correspond with language similarity; (iii) shared neurons play a vital role in generating responses, especially those shared across all languages.
- Score: 70.3132264719438
- License:
- Abstract: Large language models (LLMs) have revolutionized the field of natural language processing (NLP), and recent studies have aimed to understand their underlying mechanisms. However, most of this research is conducted within a monolingual setting, primarily focusing on English. Few studies attempt to explore the internal workings of LLMs in multilingual settings. In this study, we aim to fill the research gap by examining how neuron activation is shared across tasks and languages. We classify neurons into four distinct categories based on their responses to a specific input across different languages:all-shared, partial-shared, specific, and non-activated. This categorization is combined with a study of neuron attribution, i.e. the importance of a neuron w.r.t an output. Our analysis reveals the following insights: (i) the patterns of neuron sharing are significantly affected by the characteristics of tasks and examples; (ii) neuron sharing does not fully correspond with language similarity; (iii) shared neurons play a vital role in generating responses, especially those shared across all languages. These findings shed light on the internal workings of multilingual LLMs and pave the way to the future research. We will release the code to foster research in this area.
Related papers
- The LLM Language Network: A Neuroscientific Approach for Identifying Causally Task-Relevant Units [16.317199232071232]
Large language models (LLMs) exhibit remarkable capabilities on not just language tasks, but also various tasks that are not linguistic in nature.
In the human brain, neuroscience has identified a core language system that selectively and causally supports language processing.
We identify language-selective units within 18 popular LLMs, using the same localization approach that is used in neuroscience.
arXiv Detail & Related papers (2024-11-04T17:09:10Z) - Brain-like Functional Organization within Large Language Models [58.93629121400745]
The human brain has long inspired the pursuit of artificial intelligence (AI)
Recent neuroimaging studies provide compelling evidence of alignment between the computational representation of artificial neural networks (ANNs) and the neural responses of the human brain to stimuli.
In this study, we bridge this gap by directly coupling sub-groups of artificial neurons with functional brain networks (FBNs)
This framework links the AN sub-groups to FBNs, enabling the delineation of brain-like functional organization within large language models (LLMs)
arXiv Detail & Related papers (2024-10-25T13:15:17Z) - Towards Understanding Multi-Task Learning (Generalization) of LLMs via Detecting and Exploring Task-Specific Neurons [45.04661608619081]
We detect task-sensitive neurons in large language models (LLMs) via gradient attribution on task-specific data.
We find that the overlap of task-specific neurons is strongly associated with generalization and specialization across tasks.
We propose a neuron-level continuous fine-tuning method that only fine-tunes the current task-specific neurons during continuous learning.
arXiv Detail & Related papers (2024-07-09T01:27:35Z) - Revealing the Parallel Multilingual Learning within Large Language Models [50.098518799536144]
In this study, we reveal an in-context learning capability of multilingual large language models (LLMs)
By translating the input to several languages, we provide Parallel Input in Multiple Languages (PiM) to LLMs, which significantly enhances their comprehension abilities.
arXiv Detail & Related papers (2024-03-14T03:33:46Z) - Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models [117.20416338476856]
Large language models (LLMs) demonstrate remarkable multilingual capabilities without being pre-trained on specially curated multilingual parallel corpora.
We propose a novel detection method, language activation probability entropy (LAPE), to identify language-specific neurons within LLMs.
Our findings indicate that LLMs' proficiency in processing a particular language is predominantly due to a small subset of neurons.
arXiv Detail & Related papers (2024-02-26T09:36:05Z) - Discovering Salient Neurons in Deep NLP Models [31.18937787704794]
We present a technique called as Linguistic Correlation Analysis to extract salient neurons in the model.
Our data-driven, quantitative analysis illuminates interesting findings.
Our code is publicly available as part of the NeuroX toolkit.
arXiv Detail & Related papers (2022-06-27T13:31:49Z) - Same Neurons, Different Languages: Probing Morphosyntax in Multilingual
Pre-trained Models [84.86942006830772]
We conjecture that multilingual pre-trained models can derive language-universal abstractions about grammar.
We conduct the first large-scale empirical study over 43 languages and 14 morphosyntactic categories with a state-of-the-art neuron-level probe.
arXiv Detail & Related papers (2022-05-04T12:22:31Z) - Model-based analysis of brain activity reveals the hierarchy of language
in 305 subjects [82.81964713263483]
A popular approach to decompose the neural bases of language consists in correlating, across individuals, the brain responses to different stimuli.
Here, we show that a model-based approach can reach equivalent results within subjects exposed to natural stimuli.
arXiv Detail & Related papers (2021-10-12T15:30:21Z) - Analyzing Individual Neurons in Pre-trained Language Models [41.07850306314594]
We find small subsets of neurons to predict linguistic tasks, with lower level tasks localized in fewer neurons, compared to higher level task of predicting syntax.
For example, we found neurons in XLNet to be more localized and disjoint when predicting properties compared to BERT and others, where they are more distributed and coupled.
arXiv Detail & Related papers (2020-10-06T13:17:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.