Multi-object Video Generation from Single Frame Layouts
- URL: http://arxiv.org/abs/2305.03983v2
- Date: Tue, 23 May 2023 15:52:48 GMT
- Title: Multi-object Video Generation from Single Frame Layouts
- Authors: Yang Wu, Zhibin Liu, Hefeng Wu, Liang Lin
- Abstract summary: We propose a video generative framework capable of synthesizing global scenes with local objects.
Our framework is a non-trivial adaptation from image generation methods, and is new to this field.
Our model has been evaluated on two widely-used video recognition benchmarks.
- Score: 84.55806837855846
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: In this paper, we study video synthesis with emphasis on simplifying the
generation conditions. Most existing video synthesis models or datasets are
designed to address complex motions of a single object, lacking the ability of
comprehensively understanding the spatio-temporal relationships among multiple
objects. Besides, current methods are usually conditioned on intricate
annotations (e.g. video segmentations) to generate new videos, being
fundamentally less practical. These motivate us to generate multi-object videos
conditioning exclusively on object layouts from a single frame. To solve above
challenges and inspired by recent research on image generation from layouts, we
have proposed a novel video generative framework capable of synthesizing global
scenes with local objects, via implicit neural representations and layout
motion self-inference. Our framework is a non-trivial adaptation from image
generation methods, and is new to this field. In addition, our model has been
evaluated on two widely-used video recognition benchmarks, demonstrating
effectiveness compared to the baseline model.
Related papers
- Multi-subject Open-set Personalization in Video Generation [110.02124633005516]
We present Video Alchemist $-$ a video model with built-in multi-subject, open-set personalization capabilities.
Our model is built on a new Diffusion Transformer module that fuses each conditional reference image and its corresponding subject-level text prompt.
Our method significantly outperforms existing personalization methods in both quantitative and qualitative evaluations.
arXiv Detail & Related papers (2025-01-10T18:59:54Z) - Generative Video Propagation [87.15843701018099]
Our framework, GenProp, encodes the original video with a selective content encoder and propagates the changes made to the first frame using an image-to-video generation model.
Experiment results demonstrate the leading performance of our model in various video tasks.
arXiv Detail & Related papers (2024-12-27T17:42:29Z) - TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation [97.96178992465511]
We argue that generated videos should incorporate the emergence of new concepts and their relation transitions like in real-world videos as time progresses.
To assess the Temporal Compositionality of video generation models, we propose TC-Bench, a benchmark of meticulously crafted text prompts, corresponding ground truth videos, and robust evaluation metrics.
arXiv Detail & Related papers (2024-06-12T21:41:32Z) - Lumiere: A Space-Time Diffusion Model for Video Generation [75.54967294846686]
We introduce a Space-Time U-Net architecture that generates the entire temporal duration of the video at once.
This is in contrast to existing video models which synthesize distants followed by temporal super-resolution.
By deploying both spatial and (importantly) temporal down- and up-sampling, our model learns to directly generate a full-frame-rate, low-resolution video.
arXiv Detail & Related papers (2024-01-23T18:05:25Z) - A Good Image Generator Is What You Need for High-Resolution Video
Synthesis [73.82857768949651]
We present a framework that leverages contemporary image generators to render high-resolution videos.
We frame the video synthesis problem as discovering a trajectory in the latent space of a pre-trained and fixed image generator.
We introduce a motion generator that discovers the desired trajectory, in which content and motion are disentangled.
arXiv Detail & Related papers (2021-04-30T15:38:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.