Enabling Visual Composition and Animation in Unsupervised Video Generation
- URL: http://arxiv.org/abs/2403.14368v1
- Date: Thu, 21 Mar 2024 12:50:15 GMT
- Title: Enabling Visual Composition and Animation in Unsupervised Video Generation
- Authors: Aram Davtyan, Sepehr Sameni, Björn Ommer, Paolo Favaro,
- Abstract summary: We call our model CAGE for visual Composition and Animation for video GEneration.
We conduct a series of experiments to demonstrate capabilities of CAGE in various settings.
- Score: 42.475807996071175
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this work we propose a novel method for unsupervised controllable video generation. Once trained on a dataset of unannotated videos, at inference our model is capable of both composing scenes of predefined object parts and animating them in a plausible and controlled way. This is achieved by conditioning video generation on a randomly selected subset of local pre-trained self-supervised features during training. We call our model CAGE for visual Composition and Animation for video GEneration. We conduct a series of experiments to demonstrate capabilities of CAGE in various settings. Project website: https://araachie.github.io/cage.
Related papers
- Generative Video Propagation [87.15843701018099]
Our framework, GenProp, encodes the original video with a selective content encoder and propagates the changes made to the first frame using an image-to-video generation model.
Experiment results demonstrate the leading performance of our model in various video tasks.
arXiv Detail & Related papers (2024-12-27T17:42:29Z) - Switch-a-View: Few-Shot View Selection Learned from Edited Videos [71.01549400773197]
We introduce Switch-a-View, a model that learns to automatically select the viewpoint to display at each timepoint when creating a how-to video.
Key insight of our approach is how to train such a model from unlabeled--but human-edited--video samples.
arXiv Detail & Related papers (2024-12-24T12:16:43Z) - Video Creation by Demonstration [59.389591010842636]
We present $delta$-Diffusion, a self-supervised training approach that learns from unlabeled videos by conditional future frame prediction.
By leveraging a video foundation model with an appearance bottleneck design on top, we extract action latents from demonstration videos for conditioning the generation process.
Empirically, $delta$-Diffusion outperforms related baselines in terms of both human preference and large-scale machine evaluations.
arXiv Detail & Related papers (2024-12-12T18:41:20Z) - Grounding Video Models to Actions through Goal Conditioned Exploration [29.050431676226115]
We propose a framework that uses trajectory level action generation in combination with video guidance to enable an agent to solve complex tasks.
We show how our approach is on par with or even surpasses multiple behavior cloning baselines trained on expert demonstrations.
arXiv Detail & Related papers (2024-11-11T18:43:44Z) - Dense Video Object Captioning from Disjoint Supervision [77.47084982558101]
We propose a new task and model for dense video object captioning.
This task unifies spatial and temporal localization in video.
We show how our model improves upon a number of strong baselines for this new task.
arXiv Detail & Related papers (2023-06-20T17:57:23Z) - Learn the Force We Can: Enabling Sparse Motion Control in Multi-Object
Video Generation [26.292052071093945]
We propose an unsupervised method to generate videos from a single frame and a sparse motion input.
Our trained model can generate unseen realistic object-to-object interactions.
We show that YODA is on par with or better than state of the art video generation prior work in terms of both controllability and video quality.
arXiv Detail & Related papers (2023-06-06T19:50:02Z) - First Order Motion Model for Image Animation [90.712718329677]
Image animation consists of generating a video sequence so that an object in a source image is animated according to the motion of a driving video.
Our framework addresses this problem without using any annotation or prior information about the specific object to animate.
arXiv Detail & Related papers (2020-02-29T07:08:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.