Related papers: Dynamic Storyboard Generation in an Engine-based Virtual Environment for Video Production

Dynamic Storyboard Generation in an Engine-based Virtual Environment for Video Production

URL: http://arxiv.org/abs/2301.12688v3
Date: Fri, 21 Jul 2023 18:13:10 GMT
Title: Dynamic Storyboard Generation in an Engine-based Virtual Environment for Video Production
Authors: Anyi Rao, Xuekun Jiang, Yuwei Guo, Linning Xu, Lei Yang, Libiao Jin, Dahua Lin, Bo Dai
Abstract summary: We present Virtual Dynamic Storyboard (VDS) to allow users storyboarding shots in virtual environments. VDS runs on a "propose-simulate-discriminate" mode: Given a formatted story script and a camera script as input, it generates several character animation and camera movement proposals. To pick up the top-quality dynamic storyboard from the candidates, we equip it with a shot ranking discriminator based on shot quality criteria learned from professional manual-created data.
Score: 92.14891282042764
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Amateurs working on mini-films and short-form videos usually spend lots of time and effort on the multi-round complicated process of setting and adjusting scenes, plots, and cameras to deliver satisfying video shots. We present Virtual Dynamic Storyboard (VDS) to allow users storyboarding shots in virtual environments, where the filming staff can easily test the settings of shots before the actual filming. VDS runs on a "propose-simulate-discriminate" mode: Given a formatted story script and a camera script as input, it generates several character animation and camera movement proposals following predefined story and cinematic rules to allow an off-the-shelf simulation engine to render videos. To pick up the top-quality dynamic storyboard from the candidates, we equip it with a shot ranking discriminator based on shot quality criteria learned from professional manual-created data. VDS is comprehensively validated via extensive experiments and user studies, demonstrating its efficiency, effectiveness, and great potential in assisting amateur video production.

Related papers

Towards Understanding Camera Motions in Any Video [80.223048294482]
We introduce CameraBench, a large-scale dataset and benchmark designed to assess and improve camera motion understanding. CameraBench consists of 3,000 diverse internet videos annotated by experts through a rigorous quality control process. One of our contributions is a taxonomy of camera motion primitives, designed in collaboration with cinematographers.
arXiv Detail & Related papers (2025-04-21T18:34:57Z)
VC-LLM: Automated Advertisement Video Creation from Raw Footage using Multi-modal LLMs [43.50425781768217]
We present VC-LLM, a framework powered by Large Language Models for the automatic creation of high-quality short-form advertisement videos. Our approach leverages high-resolution spatial input and low-resolution temporal input to represent video clips more effectively. Experiments show that VC-LLM based on GPT-4o can produce videos comparable to those created by humans.
arXiv Detail & Related papers (2025-04-08T04:35:23Z)
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video [72.42376733537925]
ReCamMaster is a camera-controlled generative video re-rendering framework. It reproduces the dynamic scene of an input video at novel camera trajectories. Our method also finds promising applications in video stabilization, super-resolution, and outpainting.
arXiv Detail & Related papers (2025-03-14T17:59:31Z)
Kubrick: Multimodal Agent Collaborations for Synthetic Video Generation [4.147294190096431]
We introduce an automatic synthetic video generation pipeline based on Vision Large Language Model (VLM) agent collaborations. Given a natural language description of a video, multiple VLM agents auto-direct various processes of the generation pipeline. Our generated videos show better quality than commercial video generation models in 5 metrics on video quality and instruction-following performance.
arXiv Detail & Related papers (2024-08-19T23:31:02Z)
WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled Diffusion Models [132.77237314239025]
Video virtual try-on aims to generate realistic sequences that maintain garment identity and adapt to a person's pose and body shape in source videos. Traditional image-based methods, relying on warping and blending, struggle with complex human movements and occlusions. We reconceptualize video try-on as a process of generating videos conditioned on garment descriptions and human motion. Our solution, WildVidFit, employs image-based controlled diffusion models for a streamlined, one-stage approach.
arXiv Detail & Related papers (2024-07-15T11:21:03Z)
Image Conductor: Precision Control for Interactive Video Synthesis [90.2353794019393]
Filmmaking and animation production often require sophisticated techniques for coordinating camera transitions and object movements. Image Conductor is a method for precise control of camera transitions and object movements to generate video assets from a single image.
arXiv Detail & Related papers (2024-06-21T17:55:05Z)
Training-free Camera Control for Video Generation [15.79168688275606]
We propose a training-free and robust solution to offer camera movement control for off-the-shelf video diffusion models. Our method does not require any supervised finetuning on camera-annotated datasets or self-supervised training via data augmentation. It can be plug-and-play with most pretrained video diffusion models and generate camera-controllable videos with a single image or text prompt as input.
arXiv Detail & Related papers (2024-06-14T15:33:00Z)
Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis [43.02778060969546]
We propose a controllable monocular dynamic view synthesis pipeline. Our model does not require depth as input, and does not explicitly model 3D scene geometry. We believe our framework can potentially unlock powerful applications in rich dynamic scene understanding, perception for robotics, and interactive 3D video viewing experiences for virtual reality.
arXiv Detail & Related papers (2024-05-23T17:59:52Z)
Cinematic Behavior Transfer via NeRF-based Differentiable Filming [63.1622492808519]
Existing SLAM methods face limitations in dynamic scenes and human pose estimation often focuses on 2D projections. We first introduce a reverse filming behavior estimation technique. We then introduce a cinematic transfer pipeline that is able to transfer various shot types to a new 2D video or a 3D virtual environment.
arXiv Detail & Related papers (2023-11-29T15:56:58Z)
MovieFactory: Automatic Movie Creation from Text using Large Generative Models for Language and Images [92.13079696503803]
We present MovieFactory, a framework to generate cinematic-picture (3072$times$1280), film-style (multi-scene), and multi-modality (sounding) movies. Our approach empowers users to create captivating movies with smooth transitions using simple text inputs.
arXiv Detail & Related papers (2023-06-12T17:31:23Z)
Sampling Based Scene-Space Video Processing [89.49726406622842]
We present a novel, sampling-based framework for processing video. It enables high-quality scene-space video effects in the presence of inevitable errors in depth and camera pose estimation. We present results for various casually captured, hand-held, moving, compressed, monocular videos.
arXiv Detail & Related papers (2021-02-05T05:55:04Z)
Batteries, camera, action! Learning a semantic control space for expressive robot cinematography [15.895161373307378]
We develop a data-driven framework that enables editing of complex camera positioning parameters in a semantic space. First, we generate a database of video clips with a diverse range of shots in a photo-realistic simulator. We use hundreds of participants in a crowd-sourcing framework to obtain scores for a set of semantic descriptors for each clip.
arXiv Detail & Related papers (2020-11-19T21:56:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.