How Do Multilingual Language Models Remember Facts?
- URL: http://arxiv.org/abs/2410.14387v2
- Date: Sat, 15 Feb 2025 18:22:49 GMT
- Title: How Do Multilingual Language Models Remember Facts?
- Authors: Constanza Fierro, Negar Foroutan, Desmond Elliott, Anders Søgaard,
- Abstract summary: We show that previously identified recall mechanisms in English largely apply to multilingual contexts.
We localize the role of language during recall, finding that subject enrichment is language-independent.
In decoder-only LLMs, FVs compose these two pieces of information in two separate stages.
- Score: 50.13632788453612
- License:
- Abstract: Large Language Models (LLMs) store and retrieve vast amounts of factual knowledge acquired during pre-training. Prior research has localized and identified mechanisms behind knowledge recall; however, it has only focused on English monolingual models. The question of how these mechanisms generalize to non-English languages and multilingual LLMs remains unexplored. In this paper, we address this gap by conducting a comprehensive analysis of three multilingual LLMs. First, we show that previously identified recall mechanisms in English largely apply to multilingual contexts, with nuances based on language and architecture. Next, through patching intermediate representations, we localize the role of language during recall, finding that subject enrichment is language-independent, while object extraction is language-dependent. Additionally, we discover that the last token representation acts as a Function Vector (FV), encoding both the language of the query and the content to be extracted from the subject. Furthermore, in decoder-only LLMs, FVs compose these two pieces of information in two separate stages. These insights reveal unique mechanisms in multilingual LLMs for recalling information, highlighting the need for new methodologies--such as knowledge evaluation, fact editing, and knowledge acquisition--that are specifically tailored for multilingual LLMs.
Related papers
- How does a Multilingual LM Handle Multiple Languages? [0.0]
This study critically examines capabilities in multilingual understanding, semantic representation, and cross-lingual knowledge transfer.
It assesses semantic similarity by analyzing multilingual word embeddings for consistency using cosine similarity.
It examines BLOOM-1.7B and Qwen2 through Named Entity Recognition and sentence similarity tasks to understand their linguistic structures.
arXiv Detail & Related papers (2025-02-06T18:08:14Z) - Lens: Rethinking Multilingual Enhancement for Large Language Models [70.85065197789639]
Lens is a novel approach to enhance multilingual capabilities of large language models (LLMs)
It operates by manipulating the hidden representations within the language-agnostic and language-specific subspaces from top layers of LLMs.
It achieves superior results with much fewer computational resources compared to existing post-training approaches.
arXiv Detail & Related papers (2024-10-06T08:51:30Z) - Multilingual Needle in a Haystack: Investigating Long-Context Behavior of Multilingual Large Language Models [22.859955360764275]
We introduce the MultiLingual Needle-in-a-Haystack (MLNeedle) test to assess a model's ability to retrieve relevant information.
We evaluate four state-of-the-art large language models on MLNeedle.
arXiv Detail & Related papers (2024-08-19T17:02:06Z) - Faux Polyglot: A Study on Information Disparity in Multilingual Large Language Models [7.615938028813914]
We studied linguistic preference in a cross-language RAG-based information search setting.
We found that LLMs displayed systemic bias towards information in the same language as the query language.
arXiv Detail & Related papers (2024-07-07T21:26:36Z) - Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.
But can these models relate corresponding concepts across languages, effectively being crosslingual?
This study evaluates six state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z) - 1+1>2: Can Large Language Models Serve as Cross-Lingual Knowledge Aggregators? [46.43162333819418]
Large Language Models (LLMs) have garnered significant attention due to their remarkable ability to process information across various languages.
Despite their capabilities, they exhibit inconsistencies in handling identical queries in different languages, presenting challenges for further advancement.
This paper introduces a method to enhance the multilingual performance of LLMs by aggregating knowledge from diverse languages.
arXiv Detail & Related papers (2024-06-20T20:32:53Z) - A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers [51.8203871494146]
The rapid development of Large Language Models (LLMs) demonstrates remarkable multilingual capabilities in natural language processing.
Despite the breakthroughs of LLMs, the investigation into the multilingual scenario remains insufficient.
This survey aims to help the research community address multilingual problems and provide a comprehensive understanding of the core concepts, key techniques, and latest developments in multilingual natural language processing based on LLMs.
arXiv Detail & Related papers (2024-05-17T17:47:39Z) - Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models [117.20416338476856]
Large language models (LLMs) demonstrate remarkable multilingual capabilities without being pre-trained on specially curated multilingual parallel corpora.
We propose a novel detection method, language activation probability entropy (LAPE), to identify language-specific neurons within LLMs.
Our findings indicate that LLMs' proficiency in processing a particular language is predominantly due to a small subset of neurons.
arXiv Detail & Related papers (2024-02-26T09:36:05Z) - Adapters for Enhanced Modeling of Multilingual Knowledge and Text [54.02078328453149]
Language models have been extended to multilingual language models (MLLMs)
Knowledge graphs contain facts in an explicit triple format, which require careful curation and are only available in a few high-resource languages.
We propose to enhance MLLMs with knowledge from multilingual knowledge graphs (MLKGs) so as to tackle language and knowledge graph tasks across many languages.
arXiv Detail & Related papers (2022-10-24T21:33:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.