More than Correlation: Do Large Language Models Learn Causal
Representations of Space?
- URL: http://arxiv.org/abs/2312.16257v1
- Date: Tue, 26 Dec 2023 01:27:29 GMT
- Title: More than Correlation: Do Large Language Models Learn Causal
Representations of Space?
- Authors: Yida Chen, Yixian Gan, Sijia Li, Li Yao, Xiaohan Zhao
- Abstract summary: This study focused on uncovering the causality of the spatial representations in large language models.
Experiments showed that the spatial representations influenced the model's performance on next word prediction and a downstream task that relies on geospatial information.
- Score: 6.293100288400849
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent work found high mutual information between the learned representations
of large language models (LLMs) and the geospatial property of its input,
hinting an emergent internal model of space. However, whether this internal
space model has any causal effects on the LLMs' behaviors was not answered by
that work, led to criticism of these findings as mere statistical correlation.
Our study focused on uncovering the causality of the spatial representations in
LLMs. In particular, we discovered the potential spatial representations in
DeBERTa, GPT-Neo using representational similarity analysis and linear and
non-linear probing. Our casual intervention experiments showed that the spatial
representations influenced the model's performance on next word prediction and
a downstream task that relies on geospatial information. Our experiments
suggested that the LLMs learn and use an internal model of space in solving
geospatial related tasks.
Related papers
- Hyperbolic Fine-tuning for Large Language Models [56.54715487997674]
This study investigates the non-Euclidean characteristics of large language models (LLMs)
We show that token embeddings exhibit a high degree of hyperbolicity, indicating a latent tree-like structure in the embedding space.
We introduce a new method called hyperbolic low-rank efficient fine-tuning, HypLoRA, that performs low-rank adaptation directly on the hyperbolic manifold.
arXiv Detail & Related papers (2024-10-05T02:58:25Z) - SpaRC and SpaRP: Spatial Reasoning Characterization and Path Generation for Understanding Spatial Reasoning Capability of Large Language Models [70.01883340129204]
spatial reasoning is a crucial component of both biological and artificial intelligence.
We present a comprehensive study of the capability of current state-of-the-art large language models (LLMs) on spatial reasoning.
arXiv Detail & Related papers (2024-06-07T01:06:34Z) - Probing the Information Theoretical Roots of Spatial Dependence Measures [3.661228054439679]
There is a relation between measures of spatial dependence and information theoretical measures of entropy.
We will explore the information theoretical roots of spatial autocorrelation through the lens of self-information.
arXiv Detail & Related papers (2024-05-28T17:44:35Z) - Characterizing Truthfulness in Large Language Model Generations with
Local Intrinsic Dimension [63.330262740414646]
We study how to characterize and predict the truthfulness of texts generated from large language models (LLMs)
We suggest investigating internal activations and quantifying LLM's truthfulness using the local intrinsic dimension (LID) of model activations.
arXiv Detail & Related papers (2024-02-28T04:56:21Z) - Dive into the Chasm: Probing the Gap between In- and Cross-Topic
Generalization [66.4659448305396]
This study analyzes various LMs with three probing-based experiments to shed light on the reasons behind the In- vs. Cross-Topic generalization gap.
We demonstrate, for the first time, that generalization gaps and the robustness of the embedding space vary significantly across LMs.
arXiv Detail & Related papers (2024-02-02T12:59:27Z) - Evaluating Spatial Understanding of Large Language Models [26.436450329727645]
Large language models show remarkable capabilities across a variety of tasks.
Recent studies suggest that LLM representations implicitly capture aspects of the underlying grounded concepts.
We design natural-language navigation tasks and evaluate the ability of LLMs to represent and reason about spatial structures.
arXiv Detail & Related papers (2023-10-23T03:44:40Z) - Comparing the latent space of generative models [0.0]
Different encodings of datapoints in the latent space of latent-vector generative models may result in more or less effective and disentangled characterizations of the different explanatory factors of variation behind the data.
A simple linear mapping is enough to pass from a latent space to another while preserving most of the information.
arXiv Detail & Related papers (2022-07-14T10:39:02Z) - Analyzing the Latent Space of GAN through Local Dimension Estimation [4.688163910878411]
style-based GANs (StyleGANs) in high-fidelity image synthesis have motivated research to understand the semantic properties of their latent spaces.
We propose a local dimension estimation algorithm for arbitrary intermediate layers in a pre-trained GAN model.
Our proposed metric, called Distortion, measures an inconsistency of intrinsic space on the learned latent space.
arXiv Detail & Related papers (2022-05-26T06:36:06Z) - Contrastive Neighborhood Alignment [81.65103777329874]
We present Contrastive Neighborhood Alignment (CNA), a manifold learning approach to maintain the topology of learned features.
The target model aims to mimic the local structure of the source representation space using a contrastive loss.
CNA is illustrated in three scenarios: manifold learning, where the model maintains the local topology of the original data in a dimension-reduced space; model distillation, where a small student model is trained to mimic a larger teacher; and legacy model update, where an older model is replaced by a more powerful one.
arXiv Detail & Related papers (2022-01-06T04:58:31Z) - Spatial machine-learning model diagnostics: a model-agnostic
distance-based approach [91.62936410696409]
This contribution proposes spatial prediction error profiles (SPEPs) and spatial variable importance profiles (SVIPs) as novel model-agnostic assessment and interpretation tools.
The SPEPs and SVIPs of geostatistical methods, linear models, random forest, and hybrid algorithms show striking differences and also relevant similarities.
The novel diagnostic tools enrich the toolkit of spatial data science, and may improve ML model interpretation, selection, and design.
arXiv Detail & Related papers (2021-11-13T01:50:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.