Related papers: Physics of Language Models: Part 3.2, Knowledge Manipulation

Physics of Language Models: Part 3.2, Knowledge Manipulation

URL: http://arxiv.org/abs/2309.14402v2
Date: Tue, 16 Jul 2024 10:33:12 GMT
Title: Physics of Language Models: Part 3.2, Knowledge Manipulation
Authors: Zeyuan Allen-Zhu, Yuanzhi Li,
Abstract summary: This paper investigates four fundamental knowledge manipulation tasks. We show that language models excel in knowledge retrieval but struggle even in the simplest classification or comparison tasks. Our findings also apply to modern pretrained language models such as GPT-4.
Score: 51.68385617116854
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Language models can store vast factual knowledge, yet their ability to flexibly use this knowledge for downstream tasks (e.g., via instruction finetuning) remains questionable. This paper investigates four fundamental knowledge manipulation tasks: retrieval (e.g., "What is person A's attribute X?"), classification (e.g., "Is A's attribute X even or odd?"), comparison (e.g., "Is A greater than B in attribute X?"), and inverse search (e.g., "Which person's attribute X equals T?"). We show that language models excel in knowledge retrieval but struggle even in the simplest classification or comparison tasks unless Chain of Thoughts (CoTs) are employed during both training and inference. Moreover, their performance in inverse knowledge search is virtually 0%, regardless of the prompts. Our primary contribution is a controlled, synthetic experiment that confirms these weaknesses are inherent to language models: they cannot efficiently manipulate knowledge from pre-training data, even when such knowledge is perfectly stored in the models, despite adequate training and sufficient model size. Our findings also apply to modern pretrained language models such as GPT-4, thus giving rise to many Turing tests to distinguish Humans from contemporary AIs.

Related papers

Large Language Models are Limited in Out-of-Context Knowledge Reasoning [65.72847298578071]
Large Language Models (LLMs) possess extensive knowledge and strong capabilities in performing in-context reasoning. This paper focuses on a significant aspect of out-of-context reasoning: Out-of-Context Knowledge Reasoning (OCKR), which is to combine multiple knowledge to infer new knowledge.
arXiv Detail & Related papers (2024-06-11T15:58:59Z)
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws [51.68385617116854]
Scaling laws describe the relationship between the size of language models and their capabilities. We focus on factual knowledge represented as domains, such as (USA, capital, Washington D.C.) from a Wikipedia page. A 7B model can store 14B bits of knowledge, surpassing the English Wikipedia and textbooks combined.
arXiv Detail & Related papers (2024-04-08T11:11:31Z)
Are Emergent Abilities in Large Language Models just In-Context Learning? [46.561464069450444]
We present a novel theory that explains emergent abilities, taking into account their potential confounding factors. Our findings suggest that purported emergent abilities are not truly emergent, but result from a combination of in-context learning, model memory, and linguistic knowledge.
arXiv Detail & Related papers (2023-09-04T20:54:11Z)
The KITMUS Test: Evaluating Knowledge Integration from Multiple Sources in Natural Language Understanding Systems [87.3207729953778]
We evaluate state-of-the-art coreference resolution models on our dataset. Several models struggle to reason on-the-fly over knowledge observed both at pretrain time and at inference time. Still, even the best performing models seem to have difficulties with reliably integrating knowledge presented only at inference time.
arXiv Detail & Related papers (2022-12-15T23:26:54Z)
Discovering Latent Knowledge in Language Models Without Supervision [72.95136739040676]
Existing techniques for training language models can be misaligned with the truth. We propose directly finding latent knowledge inside the internal activations of a language model in a purely unsupervised way. We show that despite using no supervision and no model outputs, our method can recover diverse knowledge represented in large language models.
arXiv Detail & Related papers (2022-12-07T18:17:56Z)
Zero-shot Commonsense Question Answering with Cloze Translation and Consistency Optimization [20.14487209460865]
We investigate four translation methods that can translate natural questions into cloze-style sentences. We show that our methods are complementary datasets to a knowledge base improved model, and combining them can lead to state-of-the-art zero-shot performance.
arXiv Detail & Related papers (2022-01-01T07:12:49Z)
Generated Knowledge Prompting for Commonsense Reasoning [53.88983683513114]
We propose generating knowledge statements directly from a language model with a generic prompt format. This approach improves performance of both off-the-shelf and finetuned language models on four commonsense reasoning tasks. Notably, we find that a model's predictions can improve when using its own generated knowledge.
arXiv Detail & Related papers (2021-10-15T21:58:03Z)
BERT is to NLP what AlexNet is to CV: Can Pre-Trained Language Models Identify Analogies? [35.381345454627]
We analyze the capabilities of transformer-based language models on an unsupervised task of identifying analogies. Off-the-shelf language models can identify analogies to a certain extent, but struggle with abstract and complex relations. Our results raise important questions for future work about how, and to what extent, pre-trained language models capture knowledge about abstract semantic relations.
arXiv Detail & Related papers (2021-05-11T11:38:49Z)
Knowledge-Aware Language Model Pretraining [29.56904859722379]
We incorporate knowledge-awareness in language model pretraining without changing the transformer architecture. We observe improved language modeling accuracy, factual correctness in LAMA knowledge probing tasks, and semantics in the hidden representations through edge probing. Our knowledge-aware language model (KALM) can serve as a drop-in replacement for GPT-2 models.
arXiv Detail & Related papers (2020-06-29T06:09:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.