Make-A-Storyboard: A General Framework for Storyboard with Disentangled
and Merged Control
- URL: http://arxiv.org/abs/2312.07549v1
- Date: Wed, 6 Dec 2023 12:16:23 GMT
- Title: Make-A-Storyboard: A General Framework for Storyboard with Disentangled
and Merged Control
- Authors: Sitong Su, Litao Guo, Lianli Gao, Heng Tao Shen, Jingkuan Song
- Abstract summary: We propose a new presentation form for Story Visualization called Storyboard, inspired by film-making.
Within each scene in Storyboard, characters engage in activities at the same location, necessitating both visually consistent scenes and characters.
Our method could be seamlessly integrated into mainstream Image Customization methods, empowering them with the capability of story visualization.
- Score: 131.1446077627191
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Story Visualization aims to generate images aligned with story prompts,
reflecting the coherence of storybooks through visual consistency among
characters and scenes.Whereas current approaches exclusively concentrate on
characters and neglect the visual consistency among contextually correlated
scenes, resulting in independent character images without inter-image
coherence.To tackle this issue, we propose a new presentation form for Story
Visualization called Storyboard, inspired by film-making, as illustrated in
Fig.1.Specifically, a Storyboard unfolds a story into visual representations
scene by scene. Within each scene in Storyboard, characters engage in
activities at the same location, necessitating both visually consistent scenes
and characters.For Storyboard, we design a general framework coined as
Make-A-Storyboard that applies disentangled control over the consistency of
contextual correlated characters and scenes and then merge them to form
harmonized images.Extensive experiments demonstrate 1) Effectiveness.the
effectiveness of the method in story alignment, character consistency, and
scene correlation; 2) Generalization. Our method could be seamlessly integrated
into mainstream Image Customization methods, empowering them with the
capability of story visualization.
Related papers
- ContextualStory: Consistent Visual Storytelling with Spatially-Enhanced and Storyline Context [50.572907418430155]
Existing autoregressive methods struggle with high memory usage, slow generation speeds, and limited context integration.
We propose ContextualStory, a novel framework designed to generate coherent story frames and extend frames for story continuation.
In experiments on PororoSV and FlintstonesSV benchmarks, ContextualStory significantly outperforms existing methods in both story visualization and story continuation.
arXiv Detail & Related papers (2024-07-13T05:02:42Z) - StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion [78.1014542102578]
Story visualization aims to generate realistic and coherent images based on a storyline.
Current models adopt a frame-by-frame architecture by transforming the pre-trained text-to-image model into an auto-regressive manner.
We propose a bidirectional, unified, and efficient framework, namely StoryImager.
arXiv Detail & Related papers (2024-04-09T03:22:36Z) - TARN-VIST: Topic Aware Reinforcement Network for Visual Storytelling [14.15543866199545]
As a cross-modal task, visual storytelling aims to generate a story for an ordered image sequence automatically.
We propose a novel method, Topic Aware Reinforcement Network for VIsual StoryTelling (TARN-VIST)
In particular, we pre-extracted the topic information of stories from both visual and linguistic perspectives.
arXiv Detail & Related papers (2024-03-18T08:01:23Z) - SCO-VIST: Social Interaction Commonsense Knowledge-based Visual
Storytelling [12.560014305032437]
This paper introduces SCO-VIST, a framework representing the image sequence as a graph with objects and relations.
SCO-VIST then takes this graph representing plot points and creates bridges between plot points with semantic and occurrence-based edge weights.
This weighted story graph produces the storyline in a sequence of events using Floyd-Warshall's algorithm.
arXiv Detail & Related papers (2024-02-01T04:09:17Z) - Text-Only Training for Visual Storytelling [107.19873669536523]
We formulate visual storytelling as a visual-conditioned story generation problem.
We propose a text-only training method that separates the learning of cross-modality alignment and story generation.
arXiv Detail & Related papers (2023-08-17T09:32:17Z) - Visual Writing Prompts: Character-Grounded Story Generation with Curated
Image Sequences [67.61940880927708]
Current work on image-based story generation suffers from the fact that the existing image sequence collections do not have coherent plots behind them.
We improve visual story generation by producing a new image-grounded dataset, Visual Writing Prompts (VWP).
VWP contains almost 2K selected sequences of movie shots, each including 5-10 images.
The image sequences are aligned with a total of 12K stories which were collected via crowdsourcing given the image sequences and a set of grounded characters from the corresponding image sequence.
arXiv Detail & Related papers (2023-01-20T13:38:24Z) - Make-A-Story: Visual Memory Conditioned Consistent Story Generation [57.691064030235985]
We propose a novel autoregressive diffusion-based framework with a visual memory module that implicitly captures the actor and background context.
Our method outperforms prior state-of-the-art in generating frames with high visual quality.
Our experiments for story generation on the MUGEN, the PororoSV and the FlintstonesSV dataset show that our method not only outperforms prior state-of-the-art in generating frames with high visual quality, but also models appropriate correspondences between the characters and the background.
arXiv Detail & Related papers (2022-11-23T21:38:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.