Xp-GAN: Unsupervised Multi-object Controllable Video Generation
- URL: http://arxiv.org/abs/2111.10233v1
- Date: Fri, 19 Nov 2021 14:10:50 GMT
- Title: Xp-GAN: Unsupervised Multi-object Controllable Video Generation
- Authors: Bahman Rouhani, Mohammad Rahmati
- Abstract summary: Video Generation is a relatively new and yet popular subject in machine learning.
Current methods in Video Generation provide the user with little or no control over the exact specification of how the objects in the generate video are to be moved.
We propose a novel method that allows the user to move any number of objects of a single initial frame just by drawing bounding boxes over those objects and then moving those boxes in the desired path.
- Score: 8.807587076209566
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Video Generation is a relatively new and yet popular subject in machine
learning due to its vast variety of potential applications and its numerous
challenges. Current methods in Video Generation provide the user with little or
no control over the exact specification of how the objects in the generate
video are to be moved and located at each frame, that is, the user can't
explicitly control how each object in the video should move. In this paper we
propose a novel method that allows the user to move any number of objects of a
single initial frame just by drawing bounding boxes over those objects and then
moving those boxes in the desired path. Our model utilizes two Autoencoders to
fully decompose the motion and content information in a video and achieves
results comparable to well-known baseline and state of the art methods.
Related papers
- DragEntity: Trajectory Guided Video Generation using Entity and Positional Relationships [16.501613834154746]
DragEntity is a video generation model that utilizes entity representation for controlling the motion of multiple objects.
Our experiments validate the effectiveness of DragEntity, demonstrating its excellent performance in fine-grained control in video generation.
arXiv Detail & Related papers (2024-10-14T17:24:35Z) - Drag-A-Video: Non-rigid Video Editing with Point-based Interaction [63.78538355189017]
We propose a new diffusion-based method for interactive point-based video manipulation, called Drag-A-Video.
Our method allows users to click pairs of handle points and target points as well as masks on the first frame of an input video.
To precisely modify the contents of the video, we employ a new video-level motion supervision to update the features of the video.
arXiv Detail & Related papers (2023-12-05T18:05:59Z) - Multi-object Video Generation from Single Frame Layouts [84.55806837855846]
We propose a video generative framework capable of synthesizing global scenes with local objects.
Our framework is a non-trivial adaptation from image generation methods, and is new to this field.
Our model has been evaluated on two widely-used video recognition benchmarks.
arXiv Detail & Related papers (2023-05-06T09:07:01Z) - Playable Environments: Video Manipulation in Space and Time [98.0621309257937]
We present Playable Environments - a new representation for interactive video generation and manipulation in space and time.
With a single image at inference time, our novel framework allows the user to move objects in 3D while generating a video by providing a sequence of desired actions.
Our method builds an environment state for each frame, which can be manipulated by our proposed action module and decoded back to the image space with volumetric rendering.
arXiv Detail & Related papers (2022-03-03T18:51:05Z) - Click to Move: Controlling Video Generation with Sparse Motion [30.437648200928603]
Click to Move (C2M) is a novel framework for video generation where the user can control the motion of the synthesized video through mouse clicks.
Our model receives as input an initial frame, its corresponding segmentation map and the sparse motion vectors encoding the input provided by the user.
It outputs a plausible video sequence starting from the given frame and with a motion that is consistent with user input.
arXiv Detail & Related papers (2021-08-19T17:33:13Z) - A Good Image Generator Is What You Need for High-Resolution Video
Synthesis [73.82857768949651]
We present a framework that leverages contemporary image generators to render high-resolution videos.
We frame the video synthesis problem as discovering a trajectory in the latent space of a pre-trained and fixed image generator.
We introduce a motion generator that discovers the desired trajectory, in which content and motion are disentangled.
arXiv Detail & Related papers (2021-04-30T15:38:41Z) - Few-Shot Learning for Video Object Detection in a Transfer-Learning
Scheme [70.45901040613015]
We study the new problem of few-shot learning for video object detection.
We employ a transfer-learning framework to effectively train the video object detector on a large number of base-class objects and a few video clips of novel-class objects.
arXiv Detail & Related papers (2021-03-26T20:37:55Z) - First Order Motion Model for Image Animation [90.712718329677]
Image animation consists of generating a video sequence so that an object in a source image is animated according to the motion of a driving video.
Our framework addresses this problem without using any annotation or prior information about the specific object to animate.
arXiv Detail & Related papers (2020-02-29T07:08:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.