Panel Transitions for Genre Analysis in Visual Narratives
- URL: http://arxiv.org/abs/2312.08720v1
- Date: Thu, 14 Dec 2023 08:05:09 GMT
- Title: Panel Transitions for Genre Analysis in Visual Narratives
- Authors: Yi-Chun Chen, Arnav Jhala
- Abstract summary: We present a novel approach to do a multi-modal analysis of genre based on comics and manga-style visual narratives.
We highlight some of the limitations and challenges of our existing computational approaches in modeling subjective labels.
- Score: 1.320904960556043
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding how humans communicate and perceive narratives is important for
media technology research and development. This is particularly important in
current times when there are tools and algorithms that are easily available for
amateur users to create high-quality content. Narrative media develops over
time a set of recognizable patterns of features across similar artifacts. Genre
is one such grouping of artifacts for narrative media with similar patterns,
tropes, and story structures. While much work has been done on genre-based
classifications in text and video, we present a novel approach to do a
multi-modal analysis of genre based on comics and manga-style visual
narratives. We present a systematic feature analysis of an annotated dataset
that includes a variety of western and eastern visual books with annotations
for high-level narrative patterns. We then present a detailed analysis of the
contributions of high-level features to genre classification for this medium.
We highlight some of the limitations and challenges of our existing
computational approaches in modeling subjective labels. Our contributions to
the community are: a dataset of annotated manga books, a multi-modal analysis
of visual panels and text in a constrained and popular medium through
high-level features, and a systematic process for incorporating subjective
narrative patterns in computational models.
Related papers
- CPST: Comprehension-Preserving Style Transfer for Multi-Modal Narratives [1.320904960556043]
Among static visual narratives such as comics and manga, there are distinct visual styles in terms of presentation.
The layout of both text and media elements is also significant in terms of narrative communication.
We introduce the notion of comprehension-preserving style transfer (CPST) in such multi-modal domains.
arXiv Detail & Related papers (2023-12-14T07:26:18Z) - Foundational Models Defining a New Era in Vision: A Survey and Outlook [151.49434496615427]
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.
The models learned to bridge the gap between such modalities coupled with large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time.
The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instructions.
arXiv Detail & Related papers (2023-07-25T17:59:18Z) - Conflicts, Villains, Resolutions: Towards models of Narrative Media
Framing [19.589945994234075]
We revisit a widely used conceptualization of framing from the communication sciences which explicitly captures elements of narratives.
We adapt an effective annotation paradigm that breaks a complex annotation task into a series of simpler binary questions.
We explore automatic multi-label prediction of our frames with supervised and semi-supervised approaches.
arXiv Detail & Related papers (2023-06-03T08:50:13Z) - M-SENSE: Modeling Narrative Structure in Short Personal Narratives Using
Protagonist's Mental Representations [14.64546899992196]
We propose the task of automatically detecting prominent elements of the narrative structure by analyzing the role of characters' inferred mental state.
We introduce a STORIES dataset of short personal narratives containing manual annotations of key elements of narrative structure, specifically climax and resolution.
Our model is able to achieve significant improvements in the task of identifying climax and resolution.
arXiv Detail & Related papers (2023-02-18T20:48:02Z) - Integrating Visuospatial, Linguistic and Commonsense Structure into
Story Visualization [81.26077816854449]
We first explore the use of constituency parse trees for encoding structured input.
Second, we augment the structured input with commonsense information and study the impact of this external knowledge on the generation of visual story.
Third, we incorporate visual structure via bounding boxes and dense captioning to provide feedback about the characters/objects in generated images.
arXiv Detail & Related papers (2021-10-21T00:16:02Z) - Author Clustering and Topic Estimation for Short Texts [69.54017251622211]
We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document.
We also simultaneously cluster users, removing the need for post-hoc cluster estimation.
Our method performs as well as -- or better -- than traditional approaches to problems arising in short text.
arXiv Detail & Related papers (2021-06-15T20:55:55Z) - Sentiment analysis in tweets: an assessment study from classical to
modern text representation models [59.107260266206445]
Short texts published on Twitter have earned significant attention as a rich source of information.
Their inherent characteristics, such as the informal, and noisy linguistic style, remain challenging to many natural language processing (NLP) tasks.
This study fulfils an assessment of existing language models in distinguishing the sentiment expressed in tweets by using a rich collection of 22 datasets.
arXiv Detail & Related papers (2021-05-29T21:05:28Z) - Improving Generation and Evaluation of Visual Stories via Semantic
Consistency [72.00815192668193]
Given a series of natural language captions, an agent must generate a sequence of images that correspond to the captions.
Prior work has introduced recurrent generative models which outperform synthesis text-to-image models on this task.
We present a number of improvements to prior modeling approaches, including the addition of a dual learning framework.
arXiv Detail & Related papers (2021-05-20T20:42:42Z) - Unsupervised Graph-based Topic Modeling from Video Transcriptions [5.210353244951637]
We develop a topic extractor on video transcriptions using neural word embeddings and a graph-based clustering method.
Experimental results on the real-life multimodal data set MuSe-CaR demonstrate that our approach extracts coherent and meaningful topics.
arXiv Detail & Related papers (2021-05-04T12:48:17Z) - Topic Adaptation and Prototype Encoding for Few-Shot Visual Storytelling [81.33107307509718]
We propose a topic adaptive storyteller to model the ability of inter-topic generalization.
We also propose a prototype encoding structure to model the ability of intra-topic derivation.
Experimental results show that topic adaptation and prototype encoding structure mutually bring benefit to the few-shot model.
arXiv Detail & Related papers (2020-08-11T03:55:11Z) - Combining Visual and Textual Features for Semantic Segmentation of
Historical Newspapers [2.5899040911480187]
We introduce a multimodal approach for the semantic segmentation of historical newspapers.
Based on experiments on diachronic Swiss and Luxembourgish newspapers, we investigate the predictive power of visual and textual features.
Results show consistent improvement of multimodal models in comparison to a strong visual baseline.
arXiv Detail & Related papers (2020-02-14T17:56:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.