Self-Supervised Equivariant Scene Synthesis from Video
- URL: http://arxiv.org/abs/2102.00863v1
- Date: Mon, 1 Feb 2021 14:17:31 GMT
- Title: Self-Supervised Equivariant Scene Synthesis from Video
- Authors: Cinjon Resnick, Or Litany, Cosmas Hei{\ss}, Hugo Larochelle, Joan
Bruna, Kyunghyun Cho
- Abstract summary: We propose a framework to learn scene representations from video that are automatically delineated into background, characters, and animations.
After training, we can manipulate image encodings in real time to create unseen combinations of the delineated components.
We demonstrate results on three datasets: Moving MNIST with backgrounds, 2D video game sprites, and Fashion Modeling.
- Score: 84.15595573718925
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We propose a self-supervised framework to learn scene representations from
video that are automatically delineated into background, characters, and their
animations. Our method capitalizes on moving characters being equivariant with
respect to their transformation across frames and the background being constant
with respect to that same transformation. After training, we can manipulate
image encodings in real time to create unseen combinations of the delineated
components. As far as we know, we are the first method to perform unsupervised
extraction and synthesis of interpretable background, character, and animation.
We demonstrate results on three datasets: Moving MNIST with backgrounds, 2D
video game sprites, and Fashion Modeling.
Related papers
- Zero-shot High-fidelity and Pose-controllable Character Animation [89.74818983864832]
Image-to-video (I2V) generation aims to create a video sequence from a single image.
Existing approaches suffer from inconsistency of character appearances and poor preservation of fine details.
We propose PoseAnimate, a novel zero-shot I2V framework for character animation.
arXiv Detail & Related papers (2024-04-21T14:43:31Z) - Ponymation: Learning Articulated 3D Animal Motions from Unlabeled Online Videos [47.97168047776216]
We introduce a new method for learning a generative model of articulated 3D animal motions from raw, unlabeled online videos.
Our model learns purely from a collection of unlabeled web video clips, leveraging semantic correspondences distilled from self-supervised image features.
arXiv Detail & Related papers (2023-12-21T06:44:18Z) - Blocks2World: Controlling Realistic Scenes with Editable Primitives [5.541644538483947]
We present Blocks2World, a novel method for 3D scene rendering and editing.
Our technique begins by extracting 3D parallelepipeds from various objects in a given scene using convex decomposition.
The next stage involves training a conditioned model that learns to generate images from the 2D-rendered convex primitives.
arXiv Detail & Related papers (2023-07-07T21:38:50Z) - Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free
Videos [107.65147103102662]
In this work, we utilize datasets (i.e.,image pose pair and pose-free video) and the pre-trained text-to-image (T2I) model to obtain the pose-controllable character videos.
Specifically, in the first stage, only the keypoint-image pairs are used only for a controllable text-to-image generation.
In the second stage, we finetune the motion of the above network via a pose-free video dataset by adding the learnable temporal self-attention and reformed cross-frame self-attention blocks.
arXiv Detail & Related papers (2023-04-03T17:55:14Z) - Vid2Actor: Free-viewpoint Animatable Person Synthesis from Video in the
Wild [22.881898195409885]
Given an "in-the-wild" video of a person, we reconstruct an animatable model of the person in the video.
The output model can be rendered in any body pose to any camera view, via the learned controls, without explicit 3D mesh reconstruction.
arXiv Detail & Related papers (2020-12-23T18:50:42Z) - Learned Equivariant Rendering without Transformation Supervision [105.15592625987911]
We propose a framework to learn scene representations from video that are automatically delineated into objects and background.
After training, we can manipulate and render the scenes in real time to create unseen combinations of objects, transformations, and backgrounds.
arXiv Detail & Related papers (2020-11-11T14:05:05Z) - Unpaired Motion Style Transfer from Video to Animation [74.15550388701833]
Transferring the motion style from one animation clip to another, while preserving the motion content of the latter, has been a long-standing problem in character animation.
We present a novel data-driven framework for motion style transfer, which learns from an unpaired collection of motions with style labels.
Our framework is able to extract motion styles directly from videos, bypassing 3D reconstruction, and apply them to the 3D input motion.
arXiv Detail & Related papers (2020-05-12T13:21:27Z) - First Order Motion Model for Image Animation [90.712718329677]
Image animation consists of generating a video sequence so that an object in a source image is animated according to the motion of a driving video.
Our framework addresses this problem without using any annotation or prior information about the specific object to animate.
arXiv Detail & Related papers (2020-02-29T07:08:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.