Omnimatte: Associating Objects and Their Effects in Video
- URL: http://arxiv.org/abs/2105.06993v1
- Date: Fri, 14 May 2021 17:57:08 GMT
- Title: Omnimatte: Associating Objects and Their Effects in Video
- Authors: Erika Lu, Forrester Cole, Tali Dekel, Andrew Zisserman, William T.
Freeman, Michael Rubinstein
- Abstract summary: Scene effects related to objects in video are typically overlooked by computer vision.
In this work, we take a step towards solving this novel problem of automatically associating objects with their effects in video.
Our model is trained only on the input video in a self-supervised manner, without any manual labels, and is generic---it produces omnimattes automatically for arbitrary objects and a variety of effects.
- Score: 100.66205249649131
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Computer vision is increasingly effective at segmenting objects in images and
videos; however, scene effects related to the objects---shadows, reflections,
generated smoke, etc---are typically overlooked. Identifying such scene effects
and associating them with the objects producing them is important for improving
our fundamental understanding of visual scenes, and can also assist a variety
of applications such as removing, duplicating, or enhancing objects in video.
In this work, we take a step towards solving this novel problem of
automatically associating objects with their effects in video. Given an
ordinary video and a rough segmentation mask over time of one or more subjects
of interest, we estimate an omnimatte for each subject---an alpha matte and
color image that includes the subject along with all its related time-varying
scene elements. Our model is trained only on the input video in a
self-supervised manner, without any manual labels, and is generic---it produces
omnimattes automatically for arbitrary objects and a variety of effects. We
show results on real-world videos containing interactions between different
types of subjects (cars, animals, people) and complex effects, ranging from
semi-transparent elements such as smoke and reflections, to fully opaque
effects such as objects attached to the subject.
Related papers
- MVOC: a training-free multiple video object composition method with diffusion models [10.364986401722625]
We propose a Multiple Video Object Composition (MVOC) method based on diffusion models.
We first perform DDIM inversion on each video object to obtain the corresponding noise features.
Secondly, we combine and edit each object by image editing methods to obtain the first frame of the composited video.
arXiv Detail & Related papers (2024-06-22T12:18:46Z) - ActAnywhere: Subject-Aware Video Background Generation [62.57759679425924]
Generating video background that tailors to foreground subject motion is an important problem for the movie industry and visual effects community.
This task involves background that aligns with the motion and appearance of the foreground subject, while also complies with the artist's creative intention.
We introduce ActAnywhere, a generative model that automates this process which traditionally requires tedious manual efforts.
arXiv Detail & Related papers (2024-01-19T17:16:16Z) - DisCoScene: Spatially Disentangled Generative Radiance Fields for
Controllable 3D-aware Scene Synthesis [90.32352050266104]
DisCoScene is a 3Daware generative model for high-quality and controllable scene synthesis.
It disentangles the whole scene into object-centric generative fields by learning on only 2D images with the global-local discrimination.
We demonstrate state-of-the-art performance on many scene datasets, including the challenging outdoor dataset.
arXiv Detail & Related papers (2022-12-22T18:59:59Z) - Occlusion-Aware Video Object Inpainting [72.38919601150175]
This paper presents occlusion-aware video object inpainting, which recovers both the complete shape and appearance for occluded objects in videos.
Our technical contribution VOIN jointly performs video object shape completion and occluded texture generation.
For more realistic results, VOIN is optimized using both T-PatchGAN and a newoc-temporal YouTube attention-based multi-class discriminator.
arXiv Detail & Related papers (2021-08-15T15:46:57Z) - Understanding Object Dynamics for Interactive Image-to-Video Synthesis [8.17925295907622]
We present an approach that learns naturally-looking global articulations caused by a local manipulation at a pixel level.
Our generative model learns to infer natural object dynamics as a response to user interaction.
In contrast to existing work on video prediction, we do not synthesize arbitrary realistic videos.
arXiv Detail & Related papers (2021-06-21T17:57:39Z) - Layered Neural Rendering for Retiming People in Video [108.85428504808318]
We present a method for retiming people in an ordinary, natural video.
We can temporally align different motions, change the speed of certain actions, or "erase" selected people from the video altogether.
A key property of our model is that it not only disentangles the direct motions of each person in the input video, but also correlates each person automatically with the scene changes they generate.
arXiv Detail & Related papers (2020-09-16T17:48:26Z) - First Order Motion Model for Image Animation [90.712718329677]
Image animation consists of generating a video sequence so that an object in a source image is animated according to the motion of a driving video.
Our framework addresses this problem without using any annotation or prior information about the specific object to animate.
arXiv Detail & Related papers (2020-02-29T07:08:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.