Click to Move: Controlling Video Generation with Sparse Motion
- URL: http://arxiv.org/abs/2108.08815v1
- Date: Thu, 19 Aug 2021 17:33:13 GMT
- Title: Click to Move: Controlling Video Generation with Sparse Motion
- Authors: Pierfrancesco Ardino, Marco De Nadai, Bruno Lepri, Elisa Ricci and
St\'ephane Lathuili\`ere
- Abstract summary: Click to Move (C2M) is a novel framework for video generation where the user can control the motion of the synthesized video through mouse clicks.
Our model receives as input an initial frame, its corresponding segmentation map and the sparse motion vectors encoding the input provided by the user.
It outputs a plausible video sequence starting from the given frame and with a motion that is consistent with user input.
- Score: 30.437648200928603
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces Click to Move (C2M), a novel framework for video
generation where the user can control the motion of the synthesized video
through mouse clicks specifying simple object trajectories of the key objects
in the scene. Our model receives as input an initial frame, its corresponding
segmentation map and the sparse motion vectors encoding the input provided by
the user. It outputs a plausible video sequence starting from the given frame
and with a motion that is consistent with user input. Notably, our proposed
deep architecture incorporates a Graph Convolution Network (GCN) modelling the
movements of all the objects in the scene in a holistic manner and effectively
combining the sparse user motion information and image features. Experimental
results show that C2M outperforms existing methods on two publicly available
datasets, thus demonstrating the effectiveness of our GCN framework at
modelling object interactions. The source code is publicly available at
https://github.com/PierfrancescoArdino/C2M.
Related papers
- Framer: Interactive Frame Interpolation [73.06734414930227]
Framer targets producing smoothly transitioning frames between two images as per user creativity.
Our approach supports customizing the transition process by tailoring the trajectory of some selected keypoints.
It is noteworthy that our system also offers an "autopilot" mode, where we introduce a module to estimate the keypoints and the trajectory automatically.
arXiv Detail & Related papers (2024-10-24T17:59:51Z) - DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control [42.506988751934685]
We present DreamVideo-2, a zero-shot video customization framework capable of generating videos with a specific subject and motion trajectory.
Specifically, we introduce reference attention, which leverages the model's inherent capabilities for subject learning.
We devise a mask-guided motion module to achieve precise motion control by fully utilizing the robust motion signal of box masks.
arXiv Detail & Related papers (2024-10-17T17:52:57Z) - Puppet-Master: Scaling Interactive Video Generation as a Motion Prior for Part-Level Dynamics [67.97235923372035]
We present Puppet-Master, an interactive video generative model that can serve as a motion prior for part-level dynamics.
At test time, given a single image and a sparse set of motion trajectories, Puppet-Master can synthesize a video depicting realistic part-level motion faithful to the given drag interactions.
arXiv Detail & Related papers (2024-08-08T17:59:38Z) - Motion Transformer for Unsupervised Image Animation [37.35527776043379]
Image animation aims to animate a source image by using motion learned from a driving video.
Current state-of-the-art methods typically use convolutional neural networks (CNNs) to predict motion information.
We propose a new method, the motion transformer, which is the first attempt to build a motion estimator based on a vision transformer.
arXiv Detail & Related papers (2022-09-28T12:04:58Z) - Temporal View Synthesis of Dynamic Scenes through 3D Object Motion
Estimation with Multi-Plane Images [8.185918509343816]
We study the problem of temporal view synthesis (TVS), where the goal is to predict the next frames of a video.
In this work, we consider the TVS of dynamic scenes in which both the user and objects are moving.
We predict the motion of objects by isolating and estimating the 3D object motion in the past frames and then extrapolating it.
arXiv Detail & Related papers (2022-08-19T17:40:13Z) - Segmenting Moving Objects via an Object-Centric Layered Representation [100.26138772664811]
We introduce an object-centric segmentation model with a depth-ordered layer representation.
We introduce a scalable pipeline for generating synthetic training data with multiple objects.
We evaluate the model on standard video segmentation benchmarks.
arXiv Detail & Related papers (2022-07-05T17:59:43Z) - EAN: Event Adaptive Network for Enhanced Action Recognition [66.81780707955852]
We propose a unified action recognition framework to investigate the dynamic nature of video content.
First, when extracting local cues, we generate the spatial-temporal kernels of dynamic-scale to adaptively fit the diverse events.
Second, to accurately aggregate these cues into a global video representation, we propose to mine the interactions only among a few selected foreground objects by a Transformer.
arXiv Detail & Related papers (2021-07-22T15:57:18Z) - Betrayed by Motion: Camouflaged Object Discovery via Motion Segmentation [93.22300146395536]
We design a computational architecture that discovers camouflaged objects in videos, specifically by exploiting motion information to perform object segmentation.
We collect the first large-scale Moving Camouflaged Animals (MoCA) video dataset, which consists of over 140 clips across a diverse range of animals.
We demonstrate the effectiveness of the proposed model on MoCA, and achieve competitive performance on the unsupervised segmentation protocol on DAVIS2016 by only relying on motion.
arXiv Detail & Related papers (2020-11-23T18:59:08Z) - Motion-Attentive Transition for Zero-Shot Video Object Segmentation [99.44383412488703]
We present a Motion-Attentive Transition Network (MATNet) for zero-shot object segmentation.
An asymmetric attention block, called Motion-Attentive Transition (MAT), is designed within a two-stream encoder.
In this way, the encoder becomes deeply internative, allowing for closely hierarchical interactions between object motion and appearance.
arXiv Detail & Related papers (2020-03-09T16:58:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.