Related papers: Comprehending Spatio-temporal Data via Cinematic Storytelling using Large Language Models

Comprehending Spatio-temporal Data via Cinematic Storytelling using Large Language Models

URL: http://arxiv.org/abs/2510.17301v1
Date: Mon, 20 Oct 2025 08:44:25 GMT
Title: Comprehending Spatio-temporal Data via Cinematic Storytelling using Large Language Models
Authors: Panos Kalnis. Shuo Shang, Christian S. Jensen,
Abstract summary: MapMuse is a storytelling-based framework for interpreting S-temporal data.<n>We argue that data drives storytelling from insight-temporal information visualizations.<n>The aim is to bridge the gap between data complexity and human understanding.
Score: 14.567510932057404
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Spatio-temporal data captures complex dynamics across both space and time, yet traditional visualizations are complex, require domain expertise and often fail to resonate with broader audiences. Here, we propose MapMuse, a storytelling-based framework for interpreting spatio-temporal datasets, transforming them into compelling, narrative-driven experiences. We utilize large language models and employ retrieval augmented generation (RAG) and agent-based techniques to generate comprehensive stories. Drawing on principles common in cinematic storytelling, we emphasize clarity, emotional connection, and audience-centric design. As a case study, we analyze a dataset of taxi trajectories. Two perspectives are presented: a captivating story based on a heat map that visualizes millions of taxi trip endpoints to uncover urban mobility patterns; and a detailed narrative following a single long taxi journey, enriched with city landmarks and temporal shifts. By portraying locations as characters and movement as plot, we argue that data storytelling drives insight, engagement, and action from spatio-temporal information. The case study illustrates how MapMuse can bridge the gap between data complexity and human understanding. The aim of this short paper is to provide a glimpse to the potential of the cinematic storytelling technique as an effective communication tool for spatio-temporal data, as well as to describe open problems and opportunities for future research.

Related papers

Spatio-Temporal Data Enhanced Vision-Language Model for Traffic Scene Understanding [49.748517517482014]
Traffic Scene Understanding (TSU) aims to provide a comprehensive description of the traffic scene.<n>Recent research often treats as common image understanding task, ignoring the intertemporal challenges.<n>This is the first attempt to integratetemporal information into vision models.
arXiv Detail & Related papers (2025-11-12T04:55:38Z)
Strefer: Empowering Video LLMs with Space-Time Referring and Reasoning via Synthetic Instruction Data [100.5266292850922]
Strefer is a synthetic data generation framework designed to equip Video Large Models with referring and reasoning capabilities.<n>Strefer produces diverse instruction-generation data using a data engine that pseudo-annotates temporally dense, fine-grained video metadata.<n>Our approach enhances the ability of Video LLMs to interpret to spatial and temporal references, fostering more versatile, space-time-aware reasoning essential for real-world AI companions.
arXiv Detail & Related papers (2025-09-03T17:33:20Z)
Story Ribbons: Reimagining Storyline Visualizations with Large Language Models [39.0439095287205]
Large language models (LLMs) are being used to augment and reimagine existing storyline visualization techniques.<n>We introduce an LLM-driven data parsing pipeline that automatically extracts relevant narrative information from novels and scripts.<n>We then apply this pipeline to create Story Ribbons, an interactive visualization system that helps novice and expert literary analysts explore detailed character and theme trajectories.
arXiv Detail & Related papers (2025-08-09T01:49:30Z)
VIST-GPT: Ushering in the Era of Visual Storytelling with LLMs? [42.362388367152256]
This paper presents a novel approach that leverages recent advancements in multimodal models for the visual storytelling task.<n>We utilize RoViST and GROOVIST, novel reference-free metrics designed to assess visual storytelling, focusing on visual grounding, coherence, and non-redundancy.
arXiv Detail & Related papers (2025-04-27T14:55:51Z)
Generating Visual Stories with Grounded and Coreferent Characters [63.07511918366848]
We present the first model capable of predicting visual stories with consistently grounded and coreferent character mentions.<n>Our model is finetuned on a new dataset which we build on top of the widely used VIST benchmark.<n>We also propose new evaluation metrics to measure the richness of characters and coreference in stories.
arXiv Detail & Related papers (2024-09-20T14:56:33Z)
DataNarrative: Automated Data-Driven Storytelling with Visualizations and Texts [27.218934418961197]
We introduce a novel task for data story generation and a benchmark containing 1,449 stories from diverse sources. To address the challenges of crafting coherent data stories, we propose a multiagent framework employing two LLM agents. While our agentic framework generally outperforms non-agentic counterparts in both model-based and human evaluations, the results also reveal unique challenges in data story generation.
arXiv Detail & Related papers (2024-08-09T21:31:33Z)
Narrative Maps: An Algorithmic Approach to Represent and Extract Information Narratives [6.85316573653194]
This article combines the theory of narrative representations with the data from modern online systems. A narrative map representation illustrates the events and stories in the narrative as a series of landmarks and routes on the map. Our findings have implications for intelligence analysts, computational journalists, and misinformation researchers.
arXiv Detail & Related papers (2020-09-09T18:30:44Z)
Topic Adaptation and Prototype Encoding for Few-Shot Visual Storytelling [81.33107307509718]
We propose a topic adaptive storyteller to model the ability of inter-topic generalization. We also propose a prototype encoding structure to model the ability of intra-topic derivation. Experimental results show that topic adaptation and prototype encoding structure mutually bring benefit to the few-shot model.
arXiv Detail & Related papers (2020-08-11T03:55:11Z)
PlotMachines: Outline-Conditioned Generation with Dynamic Plot State Tracking [128.76063992147016]
We present PlotMachines, a neural narrative model that learns to transform an outline into a coherent story by tracking the dynamic plot states. In addition, we enrich PlotMachines with high-level discourse structure so that the model can learn different writing styles corresponding to different parts of the narrative.
arXiv Detail & Related papers (2020-04-30T17:16:31Z)
Spatio-Temporal Graph for Video Captioning with Knowledge Distillation [50.034189314258356]
We propose a graph model for video captioning that exploits object interactions in space and time. Our model builds interpretable links and is able to provide explicit visual grounding. To avoid correlations caused by the variable number of objects, we propose an object-aware knowledge distillation mechanism.
arXiv Detail & Related papers (2020-03-31T03:58:11Z)
Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form Sentences [107.0776836117313]
Given an un-trimmed video and a declarative/interrogative sentence, STVG aims to localize the-temporal tube of the object queried. Existing methods cannot tackle the STVG task due to the ineffective tube pre-generation and the lack of novel object relationship modeling. We present a declarative-Temporal Graph Reasoning Network (STGRN) for this task.
arXiv Detail & Related papers (2020-01-19T19:53:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.