Vector Ontologies as an LLM world view extraction method
- URL: http://arxiv.org/abs/2506.13252v1
- Date: Mon, 16 Jun 2025 08:49:21 GMT
- Title: Vector Ontologies as an LLM world view extraction method
- Authors: Kaspar Rothenfusser, Bekk Blando,
- Abstract summary: Large Language Models (LLMs) possess intricate internal representations of the world, yet these structures are notoriously difficult to interpret or repurpose beyond the original prediction task.<n>A vector ontology defines a domain-specific vector space spanned by ontologically meaningful dimensions, allowing geometric analysis of concepts and relationships within a domain.<n>Using GPT-4o-mini, we extract genre representations through multiple natural language prompts and analyze the consistency of these projections across linguistic variations and their alignment with ground-truth data.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) possess intricate internal representations of the world, yet these latent structures are notoriously difficult to interpret or repurpose beyond the original prediction task. Building on our earlier work (Rothenfusser, 2025), which introduced the concept of vector ontologies as a framework for translating high-dimensional neural representations into interpretable geometric structures, this paper provides the first empirical validation of that approach. A vector ontology defines a domain-specific vector space spanned by ontologically meaningful dimensions, allowing geometric analysis of concepts and relationships within a domain. We construct an 8-dimensional vector ontology of musical genres based on Spotify audio features and test whether an LLM's internal world model of music can be consistently and accurately projected into this space. Using GPT-4o-mini, we extract genre representations through multiple natural language prompts and analyze the consistency of these projections across linguistic variations and their alignment with ground-truth data. Our results show (1) high spatial consistency of genre projections across 47 query formulations, (2) strong alignment between LLM-inferred genre locations and real-world audio feature distributions, and (3) evidence of a direct relationship between prompt phrasing and spatial shifts in the LLM's inferred vector ontology. These findings demonstrate that LLMs internalize structured, repurposable knowledge and that vector ontologies offer a promising method for extracting and analyzing this knowledge in a transparent and verifiable way.
Related papers
- Meaning-infused grammar: Gradient Acceptability Shapes the Geometric Representations of Constructions in LLMs [0.0]
This study investigates whether the internal representations in Large Language Models (LLMs) reflect the proposed function-infused gradience.<n>We analyze the neural representations of the English dative constructions (Double Object and Prepositional Object) in Pythia-$1.4$B, using a dataset of $5000$ sentence pairs systematically varied for human-rated preference strength.
arXiv Detail & Related papers (2025-07-29T23:39:21Z) - Large Language Models Encode Semantics in Low-Dimensional Linear Subspaces [31.401762286885656]
Understanding the space geometry of large language models (LLMs) is key to interpreting their behavior and improving alignment.<n>baturayWe investigate what extent LLMs internally organize related to semantic understanding.
arXiv Detail & Related papers (2025-07-13T17:03:25Z) - Geospatial Mechanistic Interpretability of Large Language Models [6.0272491755196045]
Large Language Models (LLMs) have demonstrated unprecedented capabilities across various natural language processing tasks.<n>Our aim is to advance our understanding of the internal representations that these complex models generate while processing geographical information.
arXiv Detail & Related papers (2025-05-06T09:40:06Z) - Semantic Wave Functions: Exploring Meaning in Large Language Models Through Quantum Formalism [0.0]
Large Language Models (LLMs) encode semantic relationships in high-dimensional vector embeddings.<n>This paper explores the analogy between LLM embedding spaces and quantum mechanics.<n>We introduce a "semantic wave function" to formalize this quantum-derived representation.
arXiv Detail & Related papers (2025-03-09T08:23:31Z) - Analysis and Visualization of Linguistic Structures in Large Language Models: Neural Representations of Verb-Particle Constructions in BERT [0.0]
This study investigates the internal representations of verb-particle combinations within large language models (LLMs)<n>We analyse the representational efficacy of its layers for various verb-particle constructions such as 'agree on', 'come back', and 'give up'<n>Results show that BERT's middle layers most effectively capture syntactic structures, with significant variability in representational accuracy across different verb categories.
arXiv Detail & Related papers (2024-12-19T09:21:39Z) - Decoding Diffusion: A Scalable Framework for Unsupervised Analysis of Latent Space Biases and Representations Using Natural Language Prompts [68.48103545146127]
This paper proposes a novel framework for unsupervised exploration of diffusion latent spaces.
We directly leverage natural language prompts and image captions to map latent directions.
Our method provides a more scalable and interpretable understanding of the semantic knowledge encoded within diffusion models.
arXiv Detail & Related papers (2024-10-25T21:44:51Z) - SpaRC and SpaRP: Spatial Reasoning Characterization and Path Generation for Understanding Spatial Reasoning Capability of Large Language Models [70.01883340129204]
spatial reasoning is a crucial component of both biological and artificial intelligence.
We present a comprehensive study of the capability of current state-of-the-art large language models (LLMs) on spatial reasoning.
arXiv Detail & Related papers (2024-06-07T01:06:34Z) - "You Are An Expert Linguistic Annotator": Limits of LLMs as Analyzers of
Abstract Meaning Representation [60.863629647985526]
We examine the successes and limitations of the GPT-3, ChatGPT, and GPT-4 models in analysis of sentence meaning structure.
We find that models can reliably reproduce the basic format of AMR, and can often capture core event, argument, and modifier structure.
Overall, our findings indicate that these models out-of-the-box can capture aspects of semantic structure, but there remain key limitations in their ability to support fully accurate semantic analyses or parses.
arXiv Detail & Related papers (2023-10-26T21:47:59Z) - A Geometric Notion of Causal Probing [85.49839090913515]
The linear subspace hypothesis states that, in a language model's representation space, all information about a concept such as verbal number is encoded in a linear subspace.<n>We give a set of intrinsic criteria which characterize an ideal linear concept subspace.<n>We find that, for at least one concept across two languages models, the concept subspace can be used to manipulate the concept value of the generated word with precision.
arXiv Detail & Related papers (2023-07-27T17:57:57Z) - Linear Spaces of Meanings: Compositional Structures in Vision-Language
Models [110.00434385712786]
We investigate compositional structures in data embeddings from pre-trained vision-language models (VLMs)
We first present a framework for understanding compositional structures from a geometric perspective.
We then explain what these structures entail probabilistically in the case of VLM embeddings, providing intuitions for why they arise in practice.
arXiv Detail & Related papers (2023-02-28T08:11:56Z) - A Latent-Variable Model for Intrinsic Probing [93.62808331764072]
We propose a novel latent-variable formulation for constructing intrinsic probes.<n>We find empirical evidence that pre-trained representations develop a cross-lingually entangled notion of morphosyntax.
arXiv Detail & Related papers (2022-01-20T15:01:12Z) - BURT: BERT-inspired Universal Representation from Learning Meaningful
Segment [46.51685959045527]
This work introduces and explores the universal representation learning, i.e., embeddings of different levels of linguistic unit in a uniform vector space.
We present a universal representation model, BURT, to encode different levels of linguistic unit into the same vector space.
Specifically, we extract and mask meaningful segments based on point-wise mutual information (PMI) to incorporate different granular objectives into the pre-training stage.
arXiv Detail & Related papers (2020-12-28T16:02:28Z) - Learning Universal Representations from Word to Sentence [89.82415322763475]
This work introduces and explores the universal representation learning, i.e., embeddings of different levels of linguistic unit in a uniform vector space.
We present our approach of constructing analogy datasets in terms of words, phrases and sentences.
We empirically verify that well pre-trained Transformer models incorporated with appropriate training settings may effectively yield universal representation.
arXiv Detail & Related papers (2020-09-10T03:53:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.