Training-Free Neural Matte Extraction for Visual Effects
- URL: http://arxiv.org/abs/2306.17321v1
- Date: Thu, 29 Jun 2023 22:08:12 GMT
- Title: Training-Free Neural Matte Extraction for Visual Effects
- Authors: Sharif Elcott, J.P. Lewis, Nori Kanazawa, Christoph Bregler
- Abstract summary: Alpha matting is widely used in video conferencing as well as in movies, television, and social media sites.
Deep learning approaches to the matte extraction problem are well suited to video conferencing due to the consistent subject matter.
We introduce a training-free high quality neural matte extraction approach that specifically targets the assumptions of visual effects production.
- Score: 4.91173926165739
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Alpha matting is widely used in video conferencing as well as in movies,
television, and social media sites. Deep learning approaches to the matte
extraction problem are well suited to video conferencing due to the consistent
subject matter (front-facing humans), however training-based approaches are
somewhat pointless for entertainment videos where varied subjects (spaceships,
monsters, etc.) may appear only a few times in a single movie -- if a method of
creating ground truth for training exists, just use that method to produce the
desired mattes. We introduce a training-free high quality neural matte
extraction approach that specifically targets the assumptions of visual effects
production. Our approach is based on the deep image prior, which optimizes a
deep neural network to fit a single image, thereby providing a deep encoding of
the particular image. We make use of the representations in the penultimate
layer to interpolate coarse and incomplete "trimap" constraints. Videos
processed with this approach are temporally consistent. The algorithm is both
very simple and surprisingly effective.
Related papers
- Harvest Video Foundation Models via Efficient Post-Pretraining [67.30842563833185]
We propose an efficient framework to harvest video foundation models from image ones.
Our method is intuitively simple by randomly dropping input video patches and masking out input text during the post-pretraining procedure.
Our method achieves state-of-the-art performances, which are comparable to some heavily pretrained video foundation models.
arXiv Detail & Related papers (2023-10-30T14:06:16Z) - Time Does Tell: Self-Supervised Time-Tuning of Dense Image
Representations [79.87044240860466]
We propose a novel approach that incorporates temporal consistency in dense self-supervised learning.
Our approach, which we call time-tuning, starts from image-pretrained models and fine-tunes them with a novel self-supervised temporal-alignment clustering loss on unlabeled videos.
Time-tuning improves the state-of-the-art by 8-10% for unsupervised semantic segmentation on videos and matches it for images.
arXiv Detail & Related papers (2023-08-22T21:28:58Z) - HQ3DAvatar: High Quality Controllable 3D Head Avatar [65.70885416855782]
This paper presents a novel approach to building highly photorealistic digital head avatars.
Our method learns a canonical space via an implicit function parameterized by a neural network.
At test time, our method is driven by a monocular RGB video.
arXiv Detail & Related papers (2023-03-25T13:56:33Z) - Deep Video Prior for Video Consistency and Propagation [58.250209011891904]
We present a novel and general approach for blind video temporal consistency.
Our method is only trained on a pair of original and processed videos directly instead of a large dataset.
We show that temporal consistency can be achieved by training a convolutional neural network on a video with Deep Video Prior.
arXiv Detail & Related papers (2022-01-27T16:38:52Z) - TermiNeRF: Ray Termination Prediction for Efficient Neural Rendering [18.254077751772005]
Volume rendering using neural fields has shown great promise in capturing and synthesizing novel views of 3D scenes.
This type of approach requires querying the volume network at multiple points along each viewing ray in order to render an image, resulting in very slow rendering times.
We present a method that overcomes this limitation by learning a direct mapping from camera rays to locations along the ray that are most likely to influence the pixel's final appearance.
arXiv Detail & Related papers (2021-11-05T17:50:44Z) - Temporally Coherent Person Matting Trained on Fake-Motion Dataset [0.0]
We propose a novel method to perform matting of videos depicting people that does not require additional user input such as trimaps.
Our architecture achieves temporal stability of the resulting alpha mattes by using motion-estimation-based smoothing of image-segmentation algorithm outputs.
We also propose a fake-motion algorithm that generates training clips for the video-matting network given photos with ground-truth alpha mattes and background videos.
arXiv Detail & Related papers (2021-09-10T12:53:11Z) - Attention-guided Temporal Coherent Video Object Matting [78.82835351423383]
We propose a novel deep learning-based object matting method that can achieve temporally coherent matting results.
Its key component is an attention-based temporal aggregation module that maximizes image matting networks' strength.
We show how to effectively solve the trimap generation problem by fine-tuning a state-of-the-art video object segmentation network.
arXiv Detail & Related papers (2021-05-24T17:34:57Z) - Unsupervised Learning of Monocular Depth and Ego-Motion Using Multiple
Masks [14.82498499423046]
A new unsupervised learning method of depth and ego-motion using multiple masks from monocular video is proposed in this paper.
The depth estimation network and the ego-motion estimation network are trained according to the constraints of depth and ego-motion without truth values.
The experiments on KITTI dataset show our method achieves good performance in terms of depth and ego-motion.
arXiv Detail & Related papers (2021-04-01T12:29:23Z) - Blind Video Temporal Consistency via Deep Video Prior [61.062900556483164]
We present a novel and general approach for blind video temporal consistency.
Our method is only trained on a pair of original and processed videos directly.
We show that temporal consistency can be achieved by training a convolutional network on a video with the Deep Video Prior.
arXiv Detail & Related papers (2020-10-22T16:19:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.