Related papers: Language Models Represent Space and Time

Language Models Represent Space and Time

URL: http://arxiv.org/abs/2310.02207v3
Date: Mon, 4 Mar 2024 18:25:29 GMT
Title: Language Models Represent Space and Time
Authors: Wes Gurnee, Max Tegmark
Abstract summary: We analyze the learned representations of three spatial datasets (world, US, NYC places) and three temporal datasets (historical figures, artworks, news headlines) in the Llama-2 family of models. We discover that LLMs learn linear representations of space and time across multiple scales. In addition, we identify individual "space neurons" and "time neurons" that reliably encode spatial and temporal coordinates.
Score: 7.754489121381947
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The capabilities of large language models (LLMs) have sparked debate over whether such systems just learn an enormous collection of superficial statistics or a set of more coherent and grounded representations that reflect the real world. We find evidence for the latter by analyzing the learned representations of three spatial datasets (world, US, NYC places) and three temporal datasets (historical figures, artworks, news headlines) in the Llama-2 family of models. We discover that LLMs learn linear representations of space and time across multiple scales. These representations are robust to prompting variations and unified across different entity types (e.g. cities and landmarks). In addition, we identify individual "space neurons" and "time neurons" that reliably encode spatial and temporal coordinates. While further investigation is needed, our results suggest modern LLMs learn rich spatiotemporal representations of the real world and possess basic ingredients of a world model.

Related papers

Linear Spatial World Models Emerge in Large Language Models [4.9185678564997355]
We investigate whether large language models implicitly encode linear spatial world models.<n>We introduce a formal framework for spatial world models and assess whether such structure emerges in contextual embeddings.<n>Our results provide empirical evidence that LLMs encode linear spatial world models.
arXiv Detail & Related papers (2025-06-03T15:31:00Z)
Can LLMs Learn to Map the World from Local Descriptions? [50.490593949836146]
This study investigates whether Large Language Models (LLMs) can construct coherent global spatial cognition.<n> Experiments conducted in a simulated urban environment demonstrate that LLMs exhibit latent representations aligned with real-world spatial distributions.
arXiv Detail & Related papers (2025-05-27T08:22:58Z)
Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models [14.442394137843923]
We present a detailed analysis that first delineates the core elements of spatial reasoning. We then assesses the performance of these models in both synthetic and real-world images.
arXiv Detail & Related papers (2025-03-25T14:34:06Z)
OmniGeo: Towards a Multimodal Large Language Models for Geospatial Artificial Intelligence [51.0456395687016]
multimodal large language models (LLMs) have opened new frontiers in artificial intelligence. We propose a MLLM (OmniGeo) tailored to geospatial applications. By combining the strengths of natural language understanding and spatial reasoning, our model enhances the ability of instruction following and the accuracy of GeoAI systems.
arXiv Detail & Related papers (2025-03-20T16:45:48Z)
SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability [58.46310813774538]
Large language models (LMLMs) have made remarkable progress in either temporal or spatial localization. However they struggle to perform-temporal video grounding. This limitation stems from two major challenges. We introduce SpaceLM, a MLLMVL endowed with temporal-temporal video grounding.
arXiv Detail & Related papers (2025-03-18T07:40:36Z)
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces [34.809309396448654]
We present a novel video-based visual-spatial intelligence benchmark (VSI-Bench) of over 5,000 question-answer pairs. We find that Multimodal Large Language Models (MLLMs) exhibit competitive - though subhuman - visual-spatial intelligence.
arXiv Detail & Related papers (2024-12-18T18:59:54Z)
SpaRC and SpaRP: Spatial Reasoning Characterization and Path Generation for Understanding Spatial Reasoning Capability of Large Language Models [70.01883340129204]
spatial reasoning is a crucial component of both biological and artificial intelligence. We present a comprehensive study of the capability of current state-of-the-art large language models (LLMs) on spatial reasoning.
arXiv Detail & Related papers (2024-06-07T01:06:34Z)
When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models [113.18524940863841]
This survey provides a comprehensive overview of the methodologies enabling large language models to process, understand, and generate 3D data. Our investigation spans various 3D data representations, from point clouds to Neural Radiance Fields (NeRFs) It examines their integration with LLMs for tasks such as 3D scene understanding, captioning, question-answering, and dialogue.
arXiv Detail & Related papers (2024-05-16T16:59:58Z)
Elements of World Knowledge (EWoK): A Cognition-Inspired Framework for Evaluating Basic World Knowledge in Language Models [51.891804790725686]
Elements of World Knowledge (EWoK) is a framework for evaluating language models' understanding of conceptual knowledge underlying world modeling.<n>EWoK-core-1.0 is a dataset of 4,374 items covering 11 world knowledge domains.<n>All tested models perform worse than humans, with results varying drastically across domains.
arXiv Detail & Related papers (2024-05-15T17:19:42Z)
LAMP: A Language Model on the Map [13.75316123602933]
Large Language Models (LLMs) are poised to play an increasingly important role in our lives, providing assistance across a wide array of tasks. This study introduces a novel framework for fine-tuning a pre-trained model on city-specific data, to enable it to provide accurate recommendations.
arXiv Detail & Related papers (2024-03-14T02:56:38Z)
Probing Multimodal Large Language Models for Global and Local Semantic Representations [57.25949445963422]
We study which layers of Multimodal Large Language Models make the most effort to the global image information. In this study, we find that the intermediate layers of models can encode more global semantic information. We find that the topmost layers may excessively focus on local information, leading to a diminished ability to encode global information.
arXiv Detail & Related papers (2024-02-27T08:27:15Z)
More than Correlation: Do Large Language Models Learn Causal Representations of Space? [6.293100288400849]
This study focused on uncovering the causality of the spatial representations in large language models. Experiments showed that the spatial representations influenced the model's performance on next word prediction and a downstream task that relies on geospatial information.
arXiv Detail & Related papers (2023-12-26T01:27:29Z)
Evaluating Spatial Understanding of Large Language Models [26.436450329727645]
Large language models show remarkable capabilities across a variety of tasks. Recent studies suggest that LLM representations implicitly capture aspects of the underlying grounded concepts. We design natural-language navigation tasks and evaluate the ability of LLMs to represent and reason about spatial structures.
arXiv Detail & Related papers (2023-10-23T03:44:40Z)
Things not Written in Text: Exploring Spatial Commonsense from Visual Signals [77.46233234061758]
We investigate whether models with visual signals learn more spatial commonsense than text-based models. We propose a benchmark that focuses on the relative scales of objects, and the positional relationship between people and objects under different actions. We find that image synthesis models are more capable of learning accurate and consistent spatial knowledge than other models.
arXiv Detail & Related papers (2022-03-15T17:02:30Z)
Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds [96.9027094562957]
We introduce a-temporal representation learning framework, capable of learning from unlabeled tasks. Inspired by how infants learn from visual data in the wild, we explore rich cues derived from the 3D data. STRL takes two temporally-related frames from a 3D point cloud sequence as the input, transforms it with the spatial data augmentation, and learns the invariant representation self-supervisedly.
arXiv Detail & Related papers (2021-09-01T04:17:11Z)
LEAP: Learning Articulated Occupancy of People [56.35797895609303]
We introduce LEAP (LEarning Articulated occupancy of People), a novel neural occupancy representation of the human body. Given a set of bone transformations and a query point in space, LEAP first maps the query point to a canonical space via learned linear blend skinning (LBS) functions. LEAP efficiently queries the occupancy value via an occupancy network that models accurate identity- and pose-dependent deformations in the canonical space.
arXiv Detail & Related papers (2021-04-14T13:41:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.