Related papers: Robust Symbolic Reasoning for Visual Narratives via Hierarchical and Semantically Normalized Knowledge Graphs

Robust Symbolic Reasoning for Visual Narratives via Hierarchical and Semantically Normalized Knowledge Graphs

URL: http://arxiv.org/abs/2508.14941v1
Date: Wed, 20 Aug 2025 03:43:13 GMT
Title: Robust Symbolic Reasoning for Visual Narratives via Hierarchical and Semantically Normalized Knowledge Graphs
Authors: Yi-Chun Chen,
Abstract summary: This paper introduces a semantic normalization framework for hierarchical narrative knowledge graphs.<n>We propose methods that consolidate semantically related actions and events using lexical similarity and embedding-based clustering.<n>We demonstrate the framework on annotated manga stories from the Manga109 dataset.
Score: 1.320904960556043
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Understanding visual narratives such as comics requires structured representations that capture events, characters, and their relations across multiple levels of story organization. However, symbolic narrative graphs often suffer from inconsistency and redundancy, where similar actions or events are labeled differently across annotations or contexts. Such variance limits the effectiveness of reasoning and generalization. This paper introduces a semantic normalization framework for hierarchical narrative knowledge graphs. Building on cognitively grounded models of narrative comprehension, we propose methods that consolidate semantically related actions and events using lexical similarity and embedding-based clustering. The normalization process reduces annotation noise, aligns symbolic categories across narrative levels, and preserves interpretability. We demonstrate the framework on annotated manga stories from the Manga109 dataset, applying normalization to panel-, event-, and story-level graphs. Preliminary evaluations across narrative reasoning tasks, such as action retrieval, character grounding, and event summarization, show that semantic normalization improves coherence and robustness, while maintaining symbolic transparency. These findings suggest that normalization is a key step toward scalable, cognitively inspired graph models for multimodal narrative understanding.

Related papers

Structured Graph Representations for Visual Narrative Reasoning: A Hierarchical Framework for Comics [1.320904960556043]
This paper presents a hierarchical knowledge graph framework for the structured understanding of visual narratives, focusing on comics.<n>It represents them through integrated knowledge graphs that capture semantic, spatial, and temporal relationships.<n>At the panel level, we construct multimodal graphs that link visual elements such as characters, objects, and actions with corresponding textual components, including dialogue and captions.
arXiv Detail & Related papers (2025-04-14T14:42:19Z)
Fine-Grained Modeling of Narrative Context: A Coherence Perspective via Retrospective Questions [48.18584733906447]
This work introduces an original and practical paradigm for narrative comprehension, stemming from the characteristics that individual passages within narratives tend to be more cohesively related than isolated. We propose a fine-grained modeling of narrative context, by formulating a graph dubbed NarCo, which explicitly depicts task-agnostic coherence dependencies.
arXiv Detail & Related papers (2024-02-21T06:14:04Z)
SCO-VIST: Social Interaction Commonsense Knowledge-based Visual Storytelling [12.560014305032437]
This paper introduces SCO-VIST, a framework representing the image sequence as a graph with objects and relations. SCO-VIST then takes this graph representing plot points and creates bridges between plot points with semantic and occurrence-based edge weights. This weighted story graph produces the storyline in a sequence of events using Floyd-Warshall's algorithm.
arXiv Detail & Related papers (2024-02-01T04:09:17Z)
Visual Storytelling with Question-Answer Plans [70.89011289754863]
We present a novel framework which integrates visual representations with pretrained language models and planning. Our model translates the image sequence into a visual prefix, a sequence of continuous embeddings which language models can interpret. It also leverages a sequence of question-answer pairs as a blueprint plan for selecting salient visual concepts and determining how they should be assembled into a narrative.
arXiv Detail & Related papers (2023-10-08T21:45:34Z)
Conflicts, Villains, Resolutions: Towards models of Narrative Media Framing [19.589945994234075]
We revisit a widely used conceptualization of framing from the communication sciences which explicitly captures elements of narratives. We adapt an effective annotation paradigm that breaks a complex annotation task into a series of simpler binary questions. We explore automatic multi-label prediction of our frames with supervised and semi-supervised approaches.
arXiv Detail & Related papers (2023-06-03T08:50:13Z)
Conversational Semantic Parsing using Dynamic Context Graphs [68.72121830563906]
We consider the task of conversational semantic parsing over general purpose knowledge graphs (KGs) with millions of entities, and thousands of relation-types. We focus on models which are capable of interactively mapping user utterances into executable logical forms.
arXiv Detail & Related papers (2023-05-04T16:04:41Z)
Variational Cross-Graph Reasoning and Adaptive Structured Semantics Learning for Compositional Temporal Grounding [143.5927158318524]
Temporal grounding is the task of locating a specific segment from an untrimmed video according to a query sentence. We introduce a new Compositional Temporal Grounding task and construct two new dataset splits. We argue that the inherent structured semantics inside the videos and language is the crucial factor to achieve compositional generalization.
arXiv Detail & Related papers (2023-01-22T08:02:23Z)
Narrative Maps: An Algorithmic Approach to Represent and Extract Information Narratives [6.85316573653194]
This article combines the theory of narrative representations with the data from modern online systems. A narrative map representation illustrates the events and stories in the narrative as a series of landmarks and routes on the map. Our findings have implications for intelligence analysts, computational journalists, and misinformation researchers.
arXiv Detail & Related papers (2020-09-09T18:30:44Z)
PlotMachines: Outline-Conditioned Generation with Dynamic Plot State Tracking [128.76063992147016]
We present PlotMachines, a neural narrative model that learns to transform an outline into a coherent story by tracking the dynamic plot states. In addition, we enrich PlotMachines with high-level discourse structure so that the model can learn different writing styles corresponding to different parts of the narrative.
arXiv Detail & Related papers (2020-04-30T17:16:31Z)
Temporal Embeddings and Transformer Models for Narrative Text Understanding [72.88083067388155]
We present two approaches to narrative text understanding for character relationship modelling. The temporal evolution of these relations is described by dynamic word embeddings, that are designed to learn semantic changes over time. A supervised learning approach based on the state-of-the-art transformer model BERT is used instead to detect static relations between characters.
arXiv Detail & Related papers (2020-03-19T14:23:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.