Temporally Coherent Person Matting Trained on Fake-Motion Dataset
- URL: http://arxiv.org/abs/2109.04843v1
- Date: Fri, 10 Sep 2021 12:53:11 GMT
- Title: Temporally Coherent Person Matting Trained on Fake-Motion Dataset
- Authors: Ivan Molodetskikh, Mikhail Erofeev, Andrey Moskalenko, Dmitry Vatolin
- Abstract summary: We propose a novel method to perform matting of videos depicting people that does not require additional user input such as trimaps.
Our architecture achieves temporal stability of the resulting alpha mattes by using motion-estimation-based smoothing of image-segmentation algorithm outputs.
We also propose a fake-motion algorithm that generates training clips for the video-matting network given photos with ground-truth alpha mattes and background videos.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a novel neural-network-based method to perform matting of videos
depicting people that does not require additional user input such as trimaps.
Our architecture achieves temporal stability of the resulting alpha mattes by
using motion-estimation-based smoothing of image-segmentation algorithm
outputs, combined with convolutional-LSTM modules on U-Net skip connections.
We also propose a fake-motion algorithm that generates training clips for the
video-matting network given photos with ground-truth alpha mattes and
background videos. We apply random motion to photos and their mattes to
simulate movement one would find in real videos and composite the result with
the background clips. It lets us train a deep neural network operating on
videos in an absence of a large annotated video dataset and provides
ground-truth training-clip foreground optical flow for use in loss functions.
Related papers
- Data Collection-free Masked Video Modeling [6.641717260925999]
We introduce an effective self-supervised learning framework for videos that leverages and less costly static images.
These pseudo-motion videos are then leveraged in masked video modeling.
Our approach is applicable to synthetic images as well, thus entirely freeing video-training from data collection costs other concerns in real data.
arXiv Detail & Related papers (2024-09-10T17:34:07Z) - Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization [52.63845811751936]
Video pre-training is challenging due to the modeling of its dynamics video.
In this paper, we address such limitations in video pre-training with an efficient video decomposition.
Our framework is both capable of comprehending and generating image and video content, as demonstrated by its performance across 13 multimodal benchmarks.
arXiv Detail & Related papers (2024-02-05T16:30:49Z) - Training-Free Neural Matte Extraction for Visual Effects [4.91173926165739]
Alpha matting is widely used in video conferencing as well as in movies, television, and social media sites.
Deep learning approaches to the matte extraction problem are well suited to video conferencing due to the consistent subject matter.
We introduce a training-free high quality neural matte extraction approach that specifically targets the assumptions of visual effects production.
arXiv Detail & Related papers (2023-06-29T22:08:12Z) - Adaptive Human Matting for Dynamic Videos [62.026375402656754]
Adaptive Matting for Dynamic Videos, termed AdaM, is a framework for simultaneously differentiating foregrounds from backgrounds.
Two interconnected network designs are employed to achieve this goal.
We benchmark and study our methods recently introduced datasets, showing that our matting achieves new best-in-class generalizability.
arXiv Detail & Related papers (2023-04-12T17:55:59Z) - DropMAE: Masked Autoencoders with Spatial-Attention Dropout for Tracking
Tasks [76.24996889649744]
Masked autoencoder (MAE) pretraining on videos for matching-based downstream tasks, including visual object tracking (VOT) and video object segmentation (VOS)
We propose DropMAE, which adaptively performs spatial-attention dropout in the frame reconstruction to facilitate temporal correspondence learning in videos.
Our model sets new state-of-the-art performance on 8 out of 9 highly competitive video tracking and segmentation datasets.
arXiv Detail & Related papers (2023-04-02T16:40:42Z) - MagicVideo: Efficient Video Generation With Latent Diffusion Models [76.95903791630624]
We present an efficient text-to-video generation framework based on latent diffusion models, termed MagicVideo.
Due to a novel and efficient 3D U-Net design and modeling video distributions in a low-dimensional space, MagicVideo can synthesize video clips with 256x256 spatial resolution on a single GPU card.
We conduct extensive experiments and demonstrate that MagicVideo can generate high-quality video clips with either realistic or imaginary content.
arXiv Detail & Related papers (2022-11-20T16:40:31Z) - Frozen CLIP Models are Efficient Video Learners [86.73871814176795]
Video recognition has been dominated by the end-to-end learning paradigm.
Recent advances in Contrastive Vision-Language Pre-training pave the way for a new route for visual recognition tasks.
We present Efficient Video Learning -- an efficient framework for directly training high-quality video recognition models.
arXiv Detail & Related papers (2022-08-06T17:38:25Z) - Unfolding a blurred image [36.519356428362286]
We learn motion representation from sharp videos in an unsupervised manner.
We then train a convolutional recurrent video autoencoder network that performs a surrogate task of video reconstruction.
It is employed for guided training of a motion encoder for blurred images.
This network extracts embedded motion information from the blurred image to generate a sharp video in conjunction with the trained recurrent video decoder.
arXiv Detail & Related papers (2022-01-28T09:39:55Z) - Attention-guided Temporal Coherent Video Object Matting [78.82835351423383]
We propose a novel deep learning-based object matting method that can achieve temporally coherent matting results.
Its key component is an attention-based temporal aggregation module that maximizes image matting networks' strength.
We show how to effectively solve the trimap generation problem by fine-tuning a state-of-the-art video object segmentation network.
arXiv Detail & Related papers (2021-05-24T17:34:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.