ViNTER: Image Narrative Generation with Emotion-Arc-Aware Transformer
- URL: http://arxiv.org/abs/2202.07305v1
- Date: Tue, 15 Feb 2022 10:53:08 GMT
- Title: ViNTER: Image Narrative Generation with Emotion-Arc-Aware Transformer
- Authors: Kohei Uehara, Yusuke Mori, Yusuke Mukuta, Tatsuya Harada
- Abstract summary: We propose a model called ViNTER to generate image narratives that focus on time series representing varying emotions as "emotion arcs"
We present experimental results of both manual and automatic evaluations.
- Score: 59.05857591535986
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Image narrative generation describes the creation of stories regarding the
content of image data from a subjective viewpoint. Given the importance of the
subjective feelings of writers, characters, and readers in storytelling, image
narrative generation methods must consider human emotion, which is their major
difference from descriptive caption generation tasks. The development of
automated methods to generate story-like text associated with images may be
considered to be of considerable social significance, because stories serve
essential functions both as entertainment and also for many practical purposes
such as education and advertising. In this study, we propose a model called
ViNTER (Visual Narrative Transformer with Emotion arc Representation) to
generate image narratives that focus on time series representing varying
emotions as "emotion arcs," to take advantage of recent advances in multimodal
Transformer-based pre-trained models. We present experimental results of both
manual and automatic evaluations, which demonstrate the effectiveness of the
proposed emotion-aware approach to image narrative generation.
Related papers
- Imagining from Images with an AI Storytelling Tool [0.27309692684728604]
The proposed method explores the multimodal capabilities of GPT-4o to interpret visual content and create engaging stories.
The method is supported by a fully implemented tool, called ImageTeller, which accepts images from diverse sources as input.
arXiv Detail & Related papers (2024-08-21T10:49:15Z) - Envisioning Narrative Intelligence: A Creative Visual Storytelling
Anthology [7.962160810367763]
We present five themes that characterize the variations found in this creative visual storytelling process.
We envision narrative intelligence criteria for computational visual storytelling as: creative, reliable, expressive, grounded, and responsible.
arXiv Detail & Related papers (2023-10-06T18:47:20Z) - StyleEDL: Style-Guided High-order Attention Network for Image Emotion
Distribution Learning [69.06749934902464]
We propose a style-guided high-order attention network for image emotion distribution learning termed StyleEDL.
StyleEDL interactively learns stylistic-aware representations of images by exploring the hierarchical stylistic information of visual contents.
In addition, we introduce a stylistic graph convolutional network to dynamically generate the content-dependent emotion representations.
arXiv Detail & Related papers (2023-08-06T03:22:46Z) - Visual Story Generation Based on Emotion and Keywords [5.3860505447668015]
This work proposes a story generation pipeline to co-create visual stories with the users.
The pipeline includes two parts: narrative and image generation.
arXiv Detail & Related papers (2023-01-07T03:56:49Z) - Language Does More Than Describe: On The Lack Of Figurative Speech in
Text-To-Image Models [63.545146807810305]
Text-to-image diffusion models can generate high-quality pictures from textual input prompts.
These models have been trained using text data collected from content-based labelling protocols.
We characterise the sentimentality, objectiveness and degree of abstraction of publicly available text data used to train current text-to-image diffusion models.
arXiv Detail & Related papers (2022-10-19T14:20:05Z) - Improving Generation and Evaluation of Visual Stories via Semantic
Consistency [72.00815192668193]
Given a series of natural language captions, an agent must generate a sequence of images that correspond to the captions.
Prior work has introduced recurrent generative models which outperform synthesis text-to-image models on this task.
We present a number of improvements to prior modeling approaches, including the addition of a dual learning framework.
arXiv Detail & Related papers (2021-05-20T20:42:42Z) - ArtEmis: Affective Language for Visual Art [46.643106054408285]
We focus on the affective experience triggered by visual artworks.
We ask the annotators to indicate the dominant emotion they feel for a given image.
This leads to a rich set of signals for both the objective content and the affective impact of an image.
arXiv Detail & Related papers (2021-01-19T01:03:40Z) - Hide-and-Tell: Learning to Bridge Photo Streams for Visual Storytelling [86.42719129731907]
We propose to explicitly learn to imagine a storyline that bridges the visual gap.
We train the network to produce a full plausible story even with missing photo(s)
In experiments, we show that our scheme of hide-and-tell, and the network design are indeed effective at storytelling.
arXiv Detail & Related papers (2020-02-03T14:22:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.