Every picture tells a story: Image-grounded controllable stylistic story
generation
- URL: http://arxiv.org/abs/2209.01638v1
- Date: Sun, 4 Sep 2022 15:07:53 GMT
- Title: Every picture tells a story: Image-grounded controllable stylistic story
generation
- Authors: Holy Lovenia, Bryan Wilie, Romain Barraud, Samuel Cahyawijaya, Willy
Chung, Pascale Fung
- Abstract summary: We introduce Plug-and-Play Story Teller (PPST) to improve image-to-story generation.
We conduct image-to-story generation experiments with non-styled, romance-styled, and action-styled PPST approaches.
The results show that PPST improves story coherence and has better image-story relevance, but has yet to be adequately stylistic.
- Score: 39.468435527606985
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Generating a short story out of an image is arduous. Unlike image captioning,
story generation from an image poses multiple challenges: preserving the story
coherence, appropriately assessing the quality of the story, steering the
generated story into a certain style, and addressing the scarcity of
image-story pair reference datasets limiting supervision during training. In
this work, we introduce Plug-and-Play Story Teller (PPST) and improve
image-to-story generation by: 1) alleviating the data scarcity problem by
incorporating large pre-trained models, namely CLIP and GPT-2, to facilitate a
fluent image-to-text generation with minimal supervision, and 2) enabling a
more style-relevant generation by incorporating stylistic adapters to control
the story generation. We conduct image-to-story generation experiments with
non-styled, romance-styled, and action-styled PPST approaches and compare our
generated stories with those of previous work over three aspects, i.e., story
coherence, image-story relevance, and style fitness, using both automatic and
human evaluation. The results show that PPST improves story coherence and has
better image-story relevance, but has yet to be adequately stylistic.
Related papers
- StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion [78.1014542102578]
Story visualization aims to generate realistic and coherent images based on a storyline.
Current models adopt a frame-by-frame architecture by transforming the pre-trained text-to-image model into an auto-regressive manner.
We propose a bidirectional, unified, and efficient framework, namely StoryImager.
arXiv Detail & Related papers (2024-04-09T03:22:36Z) - Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion
Models [70.86603627188519]
We focus on a novel, yet challenging task of generating a coherent image sequence based on a given storyline, denoted as open-ended visual storytelling.
We propose a learning-based auto-regressive image generation model, termed as StoryGen, with a novel vision-language context module.
We show StoryGen can generalize to unseen characters without any optimization, and generate image sequences with coherent content and consistent character.
arXiv Detail & Related papers (2023-06-01T17:58:50Z) - Visual Writing Prompts: Character-Grounded Story Generation with Curated
Image Sequences [67.61940880927708]
Current work on image-based story generation suffers from the fact that the existing image sequence collections do not have coherent plots behind them.
We improve visual story generation by producing a new image-grounded dataset, Visual Writing Prompts (VWP).
VWP contains almost 2K selected sequences of movie shots, each including 5-10 images.
The image sequences are aligned with a total of 12K stories which were collected via crowdsourcing given the image sequences and a set of grounded characters from the corresponding image sequence.
arXiv Detail & Related papers (2023-01-20T13:38:24Z) - StoryDALL-E: Adapting Pretrained Text-to-Image Transformers for Story
Continuation [76.44802273236081]
We develop a model StoryDALL-E for story continuation, where the generated visual story is conditioned on a source image.
We show that our retro-fitting approach outperforms GAN-based models for story continuation and facilitates copying of visual elements from the source image.
Overall, our work demonstrates that pretrained text-to-image synthesis models can be adapted for complex and low-resource tasks like story continuation.
arXiv Detail & Related papers (2022-09-13T17:47:39Z) - Improving Generation and Evaluation of Visual Stories via Semantic
Consistency [72.00815192668193]
Given a series of natural language captions, an agent must generate a sequence of images that correspond to the captions.
Prior work has introduced recurrent generative models which outperform synthesis text-to-image models on this task.
We present a number of improvements to prior modeling approaches, including the addition of a dual learning framework.
arXiv Detail & Related papers (2021-05-20T20:42:42Z) - Stylized Story Generation with Style-Guided Planning [38.791298336259146]
We propose a new task, stylized story gen-eration, namely generating stories with speci-fied style given a leading context.
Our model can controllably generateemo-tion-driven or event-driven stories based on the ROCStories dataset.
arXiv Detail & Related papers (2021-05-18T15:55:38Z) - Consistency and Coherency Enhanced Story Generation [35.08911595854691]
We propose a two-stage generation framework to enhance consistency and coherency of generated stories.
The first stage is to organize the story outline which depicts the story plots and events, and the second stage is to expand the outline into a complete story.
In addition, coreference supervision signals are incorporated to reduce coreference errors and improve the coreference consistency.
arXiv Detail & Related papers (2020-10-17T16:40:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.