Related papers: From Directions to Cones: Exploring Multidimensional Representations of Propositional Facts in LLMs

From Directions to Cones: Exploring Multidimensional Representations of Propositional Facts in LLMs

URL: http://arxiv.org/abs/2505.21800v1
Date: Tue, 27 May 2025 22:14:54 GMT
Title: From Directions to Cones: Exploring Multidimensional Representations of Propositional Facts in LLMs
Authors: Stanley Yu, Vaidehi Bulusu, Oscar Yasunaga, Clayton Lau, Cole Blondin, Sean O'Brien, Kevin Zhu, Vasu Sharma,
Abstract summary: Large Language Models (LLMs) exhibit strong conversational abilities but often generate falsehoods.<n>We extend the concept cone framework, recently introduced for modeling refusal, to the domain of truth.<n>We identify multi-dimensional cones that causally mediate truth-related behavior across multiple LLM families.
Score: 3.6485741522018724
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Models (LLMs) exhibit strong conversational abilities but often generate falsehoods. Prior work suggests that the truthfulness of simple propositions can be represented as a single linear direction in a model's internal activations, but this may not fully capture its underlying geometry. In this work, we extend the concept cone framework, recently introduced for modeling refusal, to the domain of truth. We identify multi-dimensional cones that causally mediate truth-related behavior across multiple LLM families. Our results are supported by three lines of evidence: (i) causal interventions reliably flip model responses to factual statements, (ii) learned cones generalize across model architectures, and (iii) cone-based interventions preserve unrelated model behavior. These findings reveal the richer, multidirectional structure governing simple true/false propositions in LLMs and highlight concept cones as a promising tool for probing abstract behaviors.

Related papers

Large Language Models Encode Semantics in Low-Dimensional Linear Subspaces [31.401762286885656]
Understanding the space geometry of large language models (LLMs) is key to interpreting their behavior and improving alignment.<n>baturayWe investigate what extent LLMs internally organize related to semantic understanding.
arXiv Detail & Related papers (2025-07-13T17:03:25Z)
Linear Representation Transferability Hypothesis: Leveraging Small Models to Steer Large Models [6.390475802910619]
We show that representations learned across models trained on the same data can be expressed as linear combinations of a emphuniversal set of basis features.<n>These basis features underlie the learning task itself and remain consistent across models, regardless of scale.
arXiv Detail & Related papers (2025-05-31T17:45:18Z)
Exploring the Hidden Reasoning Process of Large Language Models by Misleading Them [5.4908640334628345]
Large language models (LLMs) and Vision language models (VLMs) have been able to perform various forms of reasoning tasks.<n>We propose a novel experimental approach, Misleading Fine-Tuning (MisFT), to examine whether LLMs/VLMs perform abstract reasoning.
arXiv Detail & Related papers (2025-03-20T17:54:42Z)
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data? [76.15163242945813]
Large language models (LLMs) have led many to conclude that they exhibit a form of intelligence.<n>We introduce a novel generative model that generates tokens on the basis of human-interpretable concepts represented as latent discrete variables.
arXiv Detail & Related papers (2025-03-12T01:21:17Z)
Large Language Models as Markov Chains [7.078696932669912]
We draw an equivalence between autoregressive transformer-based language models and Markov chains defined on a finite state space.<n>We relate the obtained results to the pathological behavior observed with LLMs.<n> Experiments with the most recent Llama and Gemma herds of models show that our theory correctly captures their behavior in practice.
arXiv Detail & Related papers (2024-10-03T17:45:31Z)
On the Origins of Linear Representations in Large Language Models [51.88404605700344]
We introduce a simple latent variable model to formalize the concept dynamics of the next token prediction. Experiments show that linear representations emerge when learning from data matching the latent variable model. We additionally confirm some predictions of the theory using the LLaMA-2 large language model.
arXiv Detail & Related papers (2024-03-06T17:17:36Z)
Characterizing Truthfulness in Large Language Model Generations with Local Intrinsic Dimension [63.330262740414646]
We study how to characterize and predict the truthfulness of texts generated from large language models (LLMs) We suggest investigating internal activations and quantifying LLM's truthfulness using the local intrinsic dimension (LID) of model activations.
arXiv Detail & Related papers (2024-02-28T04:56:21Z)
Beyond DAGs: A Latent Partial Causal Model for Multimodal Learning [80.44084021062105]
We propose a novel latent partial causal model for multimodal data, featuring two latent coupled variables, connected by an undirected edge, to represent the transfer of knowledge across modalities.<n>Under specific statistical assumptions, we establish an identifiability result, demonstrating that representations learned by multimodal contrastive learning correspond to the latent coupled variables up to a trivial transformation.<n>Experiments on a pre-trained CLIP model embodies disentangled representations, enabling few-shot learning and improving domain generalization across diverse real-world datasets.
arXiv Detail & Related papers (2024-02-09T07:18:06Z)
Large Language Models as an Indirect Reasoner: Contrapositive and Contradiction for Automated Reasoning [74.90592233107712]
We propose a Direct-Indirect Reasoning (DIR) method, which considers Direct Reasoning (DR) and Indirect Reasoning (IR) as multiple parallel reasoning paths that are merged to derive the final answer.<n>Our DIR method is simple yet effective and can be straightforwardly integrated with existing variants of CoT methods.
arXiv Detail & Related papers (2024-02-06T03:41:12Z)
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets [6.732432949368421]
Large Language Models (LLMs) have impressive capabilities, but are prone to outputting falsehoods. Recent work has developed techniques for inferring whether a LLM is telling the truth by training probes on the LLM's internal activations. We present evidence that at sufficient scale, LLMs linearly represent the truth or falsehood of factual statements.
arXiv Detail & Related papers (2023-10-10T17:54:39Z)
Improving Open Information Extraction with Large Language Models: A Study on Demonstration Uncertainty [52.72790059506241]
Open Information Extraction (OIE) task aims at extracting structured facts from unstructured text. Despite the potential of large language models (LLMs) like ChatGPT as a general task solver, they lag behind state-of-the-art (supervised) methods in OIE tasks.
arXiv Detail & Related papers (2023-09-07T01:35:24Z)
ConE: Cone Embeddings for Multi-Hop Reasoning over Knowledge Graphs [73.86041481470261]
Cone Embeddings (ConE) is the first geometry-based query embedding model that can handle conjunction, disjunction, and negation. ConE significantly outperforms existing state-of-the-art methods on benchmark datasets.
arXiv Detail & Related papers (2021-10-26T14:04:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.