Learned Equivariant Rendering without Transformation Supervision
- URL: http://arxiv.org/abs/2011.05787v1
- Date: Wed, 11 Nov 2020 14:05:05 GMT
- Title: Learned Equivariant Rendering without Transformation Supervision
- Authors: Cinjon Resnick, Or Litany, Hugo Larochelle, Joan Bruna, Kyunghyun Cho
- Abstract summary: We propose a framework to learn scene representations from video that are automatically delineated into objects and background.
After training, we can manipulate and render the scenes in real time to create unseen combinations of objects, transformations, and backgrounds.
- Score: 105.15592625987911
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a self-supervised framework to learn scene representations from
video that are automatically delineated into objects and background. Our method
relies on moving objects being equivariant with respect to their transformation
across frames and the background being constant. After training, we can
manipulate and render the scenes in real time to create unseen combinations of
objects, transformations, and backgrounds. We show results on moving MNIST with
backgrounds.
Related papers
- Magic Fixup: Streamlining Photo Editing by Watching Dynamic Videos [32.74215702447293]
We propose a generative model that synthesizes a photorealistic output that follows a prescribed layout.
Our method transfers fine details from the original image and preserves the identity of its parts.
We show that by using simple segmentations and coarse 2D manipulations, we can synthesize a photorealistic edit faithful to the user's input.
arXiv Detail & Related papers (2024-03-19T17:59:58Z) - Learning Explicit Object-Centric Representations with Vision
Transformers [81.38804205212425]
We build on the self-supervision task of masked autoencoding and explore its effectiveness for learning object-centric representations with transformers.
We show that the model efficiently learns to decompose simple scenes as measured by segmentation metrics on several multi-object benchmarks.
arXiv Detail & Related papers (2022-10-25T16:39:49Z) - Understanding Object Dynamics for Interactive Image-to-Video Synthesis [8.17925295907622]
We present an approach that learns naturally-looking global articulations caused by a local manipulation at a pixel level.
Our generative model learns to infer natural object dynamics as a response to user interaction.
In contrast to existing work on video prediction, we do not synthesize arbitrary realistic videos.
arXiv Detail & Related papers (2021-06-21T17:57:39Z) - Self-Supervised Equivariant Scene Synthesis from Video [84.15595573718925]
We propose a framework to learn scene representations from video that are automatically delineated into background, characters, and animations.
After training, we can manipulate image encodings in real time to create unseen combinations of the delineated components.
We demonstrate results on three datasets: Moving MNIST with backgrounds, 2D video game sprites, and Fashion Modeling.
arXiv Detail & Related papers (2021-02-01T14:17:31Z) - Self-Supervised Representation Learning from Flow Equivariance [97.13056332559526]
We present a new self-supervised learning representation framework that can be directly deployed on a video stream of complex scenes.
Our representations, learned from high-resolution raw video, can be readily used for downstream tasks on static images.
arXiv Detail & Related papers (2021-01-16T23:44:09Z) - Future Video Synthesis with Object Motion Prediction [54.31508711871764]
Instead of synthesizing images directly, our approach is designed to understand the complex scene dynamics.
The appearance of the scene components in the future is predicted by non-rigid deformation of the background and affine transformation of moving objects.
Experimental results on the Cityscapes and KITTI datasets show that our model outperforms the state-of-the-art in terms of visual quality and accuracy.
arXiv Detail & Related papers (2020-04-01T16:09:54Z) - First Order Motion Model for Image Animation [90.712718329677]
Image animation consists of generating a video sequence so that an object in a source image is animated according to the motion of a driving video.
Our framework addresses this problem without using any annotation or prior information about the specific object to animate.
arXiv Detail & Related papers (2020-02-29T07:08:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.