Related papers: Unveiling A Core Linguistic Region in Large Language Models

Unveiling A Core Linguistic Region in Large Language Models

URL: http://arxiv.org/abs/2310.14928v1
Date: Mon, 23 Oct 2023 13:31:32 GMT
Title: Unveiling A Core Linguistic Region in Large Language Models
Authors: Jun Zhao, Zhihao Zhang, Yide Ma, Qi Zhang, Tao Gui, Luhui Gao and Xuanjing Huang
Abstract summary: This paper conducts an analogical research using brain localization as a prototype. We have discovered a core region in large language models that corresponds to linguistic competence. We observe that an improvement in linguistic competence does not necessarily accompany an elevation in the model's knowledge level.
Score: 49.860260050718516
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Brain localization, which describes the association between specific regions of the brain and their corresponding functions, is widely accepted in the field of cognitive science as an objective fact. Today's large language models (LLMs) possess human-level linguistic competence and can execute complex tasks requiring abstract knowledge and reasoning. To deeply understand the inherent mechanisms of intelligence emergence in LLMs, this paper conducts an analogical research using brain localization as a prototype. We have discovered a core region in LLMs that corresponds to linguistic competence, accounting for approximately 1% of the total model parameters. This core region exhibits significant dimension dependency, and perturbations to even a single parameter on specific dimensions can lead to a loss of linguistic competence. Furthermore, we observe that an improvement in linguistic competence does not necessarily accompany an elevation in the model's knowledge level, which might imply the existence of regions of domain knowledge that are dissociated from the linguistic region. Overall, exploring the LLMs' functional regions provides insights into the foundation of their intelligence. In the future, we will continue to investigate knowledge regions within LLMs and the interactions between them.

Related papers

Sparse Auto-Encoder Interprets Linguistic Features in Large Language Models [40.12943080113246]
We present a systematic and comprehensive causal investigation using sparse auto-encoders (SAEs) We extract a wide range of linguistic features from six dimensions. We introduce two indices-Feature Representation Confidence (FRC) and Feature Intervention Confidence (FIC)-to measure the ability of linguistic features to capture and control linguistic phenomena.
arXiv Detail & Related papers (2025-02-27T18:16:47Z)
IOLBENCH: Benchmarking LLMs on Linguistic Reasoning [8.20398036986024]
We introduce IOLBENCH, a novel benchmark derived from International Linguistics Olympiad (IOL) problems. This dataset encompasses diverse problems testing syntax, morphology, phonology, and semantics. We find that even the most advanced models struggle to handle the intricacies of linguistic complexity.
arXiv Detail & Related papers (2025-01-08T03:15:10Z)
One Mind, Many Tongues: A Deep Dive into Language-Agnostic Knowledge Neurons in Large Language Models [19.58983929459173]
Large language models (LLMs) have learned vast amounts of factual knowledge through self-supervised pre-training on large-scale corpora. LLMs have also demonstrated excellent multilingual capabilities, which can express the learned knowledge in multiple languages.
arXiv Detail & Related papers (2024-11-26T13:03:49Z)
The LLM Language Network: A Neuroscientific Approach for Identifying Causally Task-Relevant Units [16.317199232071232]
Large language models (LLMs) exhibit remarkable capabilities on not just language tasks, but also various tasks that are not linguistic in nature. In the human brain, neuroscience has identified a core language system that selectively and causally supports language processing. We identify language-selective units within 18 popular LLMs, using the same localization approach that is used in neuroscience.
arXiv Detail & Related papers (2024-11-04T17:09:10Z)
Converging to a Lingua Franca: Evolution of Linguistic Regions and Semantics Alignment in Multilingual Large Language Models [11.423589362950812]
Large language models (LLMs) have demonstrated remarkable performance, particularly in multilingual contexts. Recent studies suggest that LLMs can transfer skills learned in one language to others, but the internal mechanisms behind this ability remain unclear. This paper provides insights into the internal workings of LLMs, offering a foundation for future improvements in their cross-lingual capabilities.
arXiv Detail & Related papers (2024-10-15T15:49:15Z)
Lens: Rethinking Multilingual Enhancement for Large Language Models [70.85065197789639]
Lens is a novel approach to enhance multilingual capabilities of large language models (LLMs) It operates by manipulating the hidden representations within the language-agnostic and language-specific subspaces from top layers of LLMs. It achieves superior results with much fewer computational resources compared to existing post-training approaches.
arXiv Detail & Related papers (2024-10-06T08:51:30Z)
Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models [117.20416338476856]
Large language models (LLMs) demonstrate remarkable multilingual capabilities without being pre-trained on specially curated multilingual parallel corpora. We propose a novel detection method, language activation probability entropy (LAPE), to identify language-specific neurons within LLMs. Our findings indicate that LLMs' proficiency in processing a particular language is predominantly due to a small subset of neurons.
arXiv Detail & Related papers (2024-02-26T09:36:05Z)
Unveiling Linguistic Regions in Large Language Models [49.298360366468934]
Large Language Models (LLMs) have demonstrated considerable cross-lingual alignment and generalization ability. This paper conducts several investigations on the linguistic competence of LLMs.
arXiv Detail & Related papers (2024-02-22T16:56:13Z)
Quantifying the Dialect Gap and its Correlates Across Languages [69.18461982439031]
This work will lay the foundation for furthering the field of dialectal NLP by laying out evident disparities and identifying possible pathways for addressing them through mindful data collection.
arXiv Detail & Related papers (2023-10-23T17:42:01Z)
Dissociating language and thought in large language models [52.39241645471213]
Large Language Models (LLMs) have come closest among all models to date to mastering human language. We ground this distinction in human neuroscience, which has shown that formal and functional competence rely on different neural mechanisms. Although LLMs are surprisingly good at formal competence, their performance on functional competence tasks remains spotty.
arXiv Detail & Related papers (2023-01-16T22:41:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.