Visual Story Generation Based on Emotion and Keywords
- URL: http://arxiv.org/abs/2301.02777v1
- Date: Sat, 7 Jan 2023 03:56:49 GMT
- Title: Visual Story Generation Based on Emotion and Keywords
- Authors: Yuetian Chen, Ruohua Li, Bowen Shi, Peiru Liu, Mei Si
- Abstract summary: This work proposes a story generation pipeline to co-create visual stories with the users.
The pipeline includes two parts: narrative and image generation.
- Score: 5.3860505447668015
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automated visual story generation aims to produce stories with corresponding
illustrations that exhibit coherence, progression, and adherence to characters'
emotional development. This work proposes a story generation pipeline to
co-create visual stories with the users. The pipeline allows the user to
control events and emotions on the generated content. The pipeline includes two
parts: narrative and image generation. For narrative generation, the system
generates the next sentence using user-specified keywords and emotion labels.
For image generation, diffusion models are used to create a visually appealing
image corresponding to each generated sentence. Further, object recognition is
applied to the generated images to allow objects in these images to be
mentioned in future story development.
Related papers
- Generating Visual Stories with Grounded and Coreferent Characters [63.07511918366848]
We present the first model capable of predicting visual stories with consistently grounded and coreferent character mentions.
Our model is finetuned on a new dataset which we build on top of the widely used VIST benchmark.
We also propose new evaluation metrics to measure the richness of characters and coreference in stories.
arXiv Detail & Related papers (2024-09-20T14:56:33Z) - Imagining from Images with an AI Storytelling Tool [0.27309692684728604]
The proposed method explores the multimodal capabilities of GPT-4o to interpret visual content and create engaging stories.
The method is supported by a fully implemented tool, called ImageTeller, which accepts images from diverse sources as input.
arXiv Detail & Related papers (2024-08-21T10:49:15Z) - MagicScroll: Nontypical Aspect-Ratio Image Generation for Visual
Storytelling via Multi-Layered Semantic-Aware Denoising [42.20750912837316]
MagicScroll is a progressive diffusion-based image generation framework with a novel semantic-aware denoising process.
It enables fine-grained control over the generated image on object, scene, and background levels with text, image, and layout conditions.
It showcases promising results in aligning with the narrative text, improving visual coherence, and engaging the audience.
arXiv Detail & Related papers (2023-12-18T03:09:05Z) - Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion
Models [70.86603627188519]
We focus on a novel, yet challenging task of generating a coherent image sequence based on a given storyline, denoted as open-ended visual storytelling.
We propose a learning-based auto-regressive image generation model, termed as StoryGen, with a novel vision-language context module.
We show StoryGen can generalize to unseen characters without any optimization, and generate image sequences with coherent content and consistent character.
arXiv Detail & Related papers (2023-06-01T17:58:50Z) - Visual Writing Prompts: Character-Grounded Story Generation with Curated
Image Sequences [67.61940880927708]
Current work on image-based story generation suffers from the fact that the existing image sequence collections do not have coherent plots behind them.
We improve visual story generation by producing a new image-grounded dataset, Visual Writing Prompts (VWP).
VWP contains almost 2K selected sequences of movie shots, each including 5-10 images.
The image sequences are aligned with a total of 12K stories which were collected via crowdsourcing given the image sequences and a set of grounded characters from the corresponding image sequence.
arXiv Detail & Related papers (2023-01-20T13:38:24Z) - Visualize Before You Write: Imagination-Guided Open-Ended Text
Generation [68.96699389728964]
We propose iNLG that uses machine-generated images to guide language models in open-ended text generation.
Experiments and analyses demonstrate the effectiveness of iNLG on open-ended text generation tasks.
arXiv Detail & Related papers (2022-10-07T18:01:09Z) - ViNTER: Image Narrative Generation with Emotion-Arc-Aware Transformer [59.05857591535986]
We propose a model called ViNTER to generate image narratives that focus on time series representing varying emotions as "emotion arcs"
We present experimental results of both manual and automatic evaluations.
arXiv Detail & Related papers (2022-02-15T10:53:08Z) - FairyTailor: A Multimodal Generative Framework for Storytelling [33.39639788612019]
We introduce a system and a demo, FairyTailor, for human-in-the-loop visual story co-creation.
Users can create a cohesive children's fairytale by weaving generated texts and retrieved images with their input.
To our knowledge, this is the first dynamic tool for multimodal story generation that allows interactive co-formation of both texts and images.
arXiv Detail & Related papers (2021-07-13T02:45:08Z) - Cue Me In: Content-Inducing Approaches to Interactive Story Generation [74.09575609958743]
We focus on the task of interactive story generation, where the user provides the model mid-level sentence abstractions.
We present two content-inducing approaches to effectively incorporate this additional information.
Experimental results from both automatic and human evaluations show that these methods produce more topically coherent and personalized stories.
arXiv Detail & Related papers (2020-10-20T00:36:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.