Related papers: Omnimatte: Associating Objects and Their Effects in Video

Omnimatte: Associating Objects and Their Effects in Video

URL: http://arxiv.org/abs/2105.06993v1
Date: Fri, 14 May 2021 17:57:08 GMT
Title: Omnimatte: Associating Objects and Their Effects in Video
Authors: Erika Lu, Forrester Cole, Tali Dekel, Andrew Zisserman, William T. Freeman, Michael Rubinstein
Abstract summary: Scene effects related to objects in video are typically overlooked by computer vision. In this work, we take a step towards solving this novel problem of automatically associating objects with their effects in video. Our model is trained only on the input video in a self-supervised manner, without any manual labels, and is generic---it produces omnimattes automatically for arbitrary objects and a variety of effects.
Score: 100.66205249649131
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Computer vision is increasingly effective at segmenting objects in images and videos; however, scene effects related to the objects---shadows, reflections, generated smoke, etc---are typically overlooked. Identifying such scene effects and associating them with the objects producing them is important for improving our fundamental understanding of visual scenes, and can also assist a variety of applications such as removing, duplicating, or enhancing objects in video. In this work, we take a step towards solving this novel problem of automatically associating objects with their effects in video. Given an ordinary video and a rough segmentation mask over time of one or more subjects of interest, we estimate an omnimatte for each subject---an alpha matte and color image that includes the subject along with all its related time-varying scene elements. Our model is trained only on the input video in a self-supervised manner, without any manual labels, and is generic---it produces omnimattes automatically for arbitrary objects and a variety of effects. We show results on real-world videos containing interactions between different types of subjects (cars, animals, people) and complex effects, ranging from semi-transparent elements such as smoke and reflections, to fully opaque effects such as objects attached to the subject.

Related papers

ObjectClear: Complete Object Removal via Object-Effect Attention [56.2893552300215]
We introduce a new dataset for OBject-Effect Removal, named OBER, which provides paired images with and without object effects, along with precise masks for both objects and their associated visual artifacts.<n>We propose a novel framework, ObjectClear, which incorporates an object-effect attention mechanism to guide the model toward the foreground removal regions by learning attention masks.<n>Experiments demonstrate that ObjectClear outperforms existing methods, achieving improved object-effect removal quality and background fidelity, especially in complex scenarios.
arXiv Detail & Related papers (2025-05-28T17:51:17Z)
OmnimatteZero: Training-free Real-time Omnimatte with Pre-trained Video Diffusion Models [31.48981364573974]
We present OmnimatteZero, a training-free approach that leverages off-the-shelf pre-trained video diffusion models for omnimatte. It can remove objects from videos, extract individual object layers along with their effects, and composite those objects onto new videos. We show that self-attention maps capture information about the object and its footprints and use them to inpaint the object's effects, leaving a clean background.
arXiv Detail & Related papers (2025-03-23T11:26:48Z)
OmniEraser: Remove Objects and Their Effects in Images with Paired Video-Frame Data [21.469971783624402]
In this paper, we propose Video4Removal, a large-scale dataset comprising over 100,000 high-quality samples with realistic object shadows and reflections. By constructing object-background pairs from video frames with off-the-shelf vision models, the labor costs of data acquisition can be significantly reduced. To avoid generating shape-like artifacts and unintended content, we propose Object-Background Guidance. We present OmniEraser, a novel method that seamlessly removes objects and their visual effects using only object masks as input.
arXiv Detail & Related papers (2025-01-13T15:12:40Z)
Generative Omnimatte: Learning to Decompose Video into Layers [29.098471541412113]
We present a novel generative layered video decomposition framework to address the omnimatte problem. Our core idea is to train a video diffusion model to identify and remove scene effects caused by a specific object. We show that this model can be finetuned from an existing video inpainting model with a small, carefully curated dataset.
arXiv Detail & Related papers (2024-11-25T18:59:57Z)
MVOC: a training-free multiple video object composition method with diffusion models [10.364986401722625]
We propose a Multiple Video Object Composition (MVOC) method based on diffusion models. We first perform DDIM inversion on each video object to obtain the corresponding noise features. Secondly, we combine and edit each object by image editing methods to obtain the first frame of the composited video.
arXiv Detail & Related papers (2024-06-22T12:18:46Z)
ActAnywhere: Subject-Aware Video Background Generation [62.57759679425924]
Generating video background that tailors to foreground subject motion is an important problem for the movie industry and visual effects community. This task involves background that aligns with the motion and appearance of the foreground subject, while also complies with the artist's creative intention. We introduce ActAnywhere, a generative model that automates this process which traditionally requires tedious manual efforts.
arXiv Detail & Related papers (2024-01-19T17:16:16Z)
DisCoScene: Spatially Disentangled Generative Radiance Fields for Controllable 3D-aware Scene Synthesis [90.32352050266104]
DisCoScene is a 3Daware generative model for high-quality and controllable scene synthesis. It disentangles the whole scene into object-centric generative fields by learning on only 2D images with the global-local discrimination. We demonstrate state-of-the-art performance on many scene datasets, including the challenging outdoor dataset.
arXiv Detail & Related papers (2022-12-22T18:59:59Z)
Occlusion-Aware Video Object Inpainting [72.38919601150175]
This paper presents occlusion-aware video object inpainting, which recovers both the complete shape and appearance for occluded objects in videos. Our technical contribution VOIN jointly performs video object shape completion and occluded texture generation. For more realistic results, VOIN is optimized using both T-PatchGAN and a newoc-temporal YouTube attention-based multi-class discriminator.
arXiv Detail & Related papers (2021-08-15T15:46:57Z)
Understanding Object Dynamics for Interactive Image-to-Video Synthesis [8.17925295907622]
We present an approach that learns naturally-looking global articulations caused by a local manipulation at a pixel level. Our generative model learns to infer natural object dynamics as a response to user interaction. In contrast to existing work on video prediction, we do not synthesize arbitrary realistic videos.
arXiv Detail & Related papers (2021-06-21T17:57:39Z)
Layered Neural Rendering for Retiming People in Video [108.85428504808318]
We present a method for retiming people in an ordinary, natural video. We can temporally align different motions, change the speed of certain actions, or "erase" selected people from the video altogether. A key property of our model is that it not only disentangles the direct motions of each person in the input video, but also correlates each person automatically with the scene changes they generate.
arXiv Detail & Related papers (2020-09-16T17:48:26Z)
First Order Motion Model for Image Animation [90.712718329677]
Image animation consists of generating a video sequence so that an object in a source image is animated according to the motion of a driving video. Our framework addresses this problem without using any annotation or prior information about the specific object to animate.
arXiv Detail & Related papers (2020-02-29T07:08:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.