Related papers: Real-Time High-Resolution Background Matting

Real-Time High-Resolution Background Matting

URL: http://arxiv.org/abs/2012.07810v1
Date: Mon, 14 Dec 2020 18:43:32 GMT
Title: Real-Time High-Resolution Background Matting
Authors: Shanchuan Lin, Andrey Ryabtsev, Soumyadip Sengupta, Brian Curless, Steve Seitz, and Ira Kemelmacher-Shlizerman
Abstract summary: We introduce a real-time, high-resolution background replacement technique which operates at 30 fps in 4K resolution, and 60 fps for HD on a modern GPU. Our approach yields higher quality results compared to the previous state-of-the-art in background matting, while simultaneously yielding a dramatic boost in both speed and resolution.
Score: 19.140664310700107
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce a real-time, high-resolution background replacement technique which operates at 30fps in 4K resolution, and 60fps for HD on a modern GPU. Our technique is based on background matting, where an additional frame of the background is captured and used in recovering the alpha matte and the foreground layer. The main challenge is to compute a high-quality alpha matte, preserving strand-level hair details, while processing high-resolution images in real-time. To achieve this goal, we employ two neural networks; a base network computes a low-resolution result which is refined by a second network operating at high-resolution on selective patches. We introduce two largescale video and image matting datasets: VideoMatte240K and PhotoMatte13K/85. Our approach yields higher quality results compared to the previous state-of-the-art in background matting, while simultaneously yielding a dramatic boost in both speed and resolution.

Related papers

FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution [50.55876151973996]
A versatile video depth estimation model should (1) be accurate across frames, (2) produce high-resolution depth maps, and (3) support real-time streaming. We propose FlashDepth, a method that satisfies all three requirements, performing depth estimation on a 2044x1148 streaming video at 24 FPS.
arXiv Detail & Related papers (2025-04-09T17:59:31Z)
Elevating Flow-Guided Video Inpainting with Reference Generation [50.03502211226332]
Video inpainting (VI) is a challenging task that requires effective propagation of observable content across frames while simultaneously generating new content not present in the original video. We propose a robust and practical VI framework that leverages a large generative model for reference generation in combination with an advanced pixel propagation algorithm. Our method not only significantly enhances frame-level quality for object removal but also synthesizes new content in the missing areas based on user-provided text prompts.
arXiv Detail & Related papers (2024-12-12T06:13:00Z)
VISION-XL: High Definition Video Inverse Problem Solver using Latent Image Diffusion Models [58.464465016269614]
We propose a novel framework for solving high-definition video inverse problems using latent image diffusion models. Our approach delivers HD-resolution reconstructions in under 6 seconds per frame on a single NVIDIA 4090 GPU.
arXiv Detail & Related papers (2024-11-29T08:10:49Z)
Hierarchical Patch Diffusion Models for High-Resolution Video Generation [50.42746357450949]
We develop deep context fusion, which propagates context information from low-scale to high-scale patches in a hierarchical manner. We also propose adaptive computation, which allocates more network capacity and computation towards coarse image details. The resulting model sets a new state-of-the-art FVD score of 66.32 and Inception Score of 87.68 in class-conditional video generation.
arXiv Detail & Related papers (2024-06-12T01:12:53Z)
DART: Depth-Enhanced Accurate and Real-Time Background Matting [11.78381754863757]
Matting with a static background, often referred to as Background Matting" (BGM), has garnered significant attention within the computer vision community. We leverage the rich depth information provided by the RGB-Depth (RGB-D) cameras to enhance background matting performance in real-time.
arXiv Detail & Related papers (2024-02-24T14:10:17Z)
CoordFill: Efficient High-Resolution Image Inpainting via Parameterized Coordinate Querying [52.91778151771145]
In this paper, we try to break the limitations for the first time thanks to the recent development of continuous implicit representation. Experiments show that the proposed method achieves real-time performance on the 2048$times$2048 images using a single GTX 2080 Ti GPU.
arXiv Detail & Related papers (2023-03-15T11:13:51Z)
Shakes on a Plane: Unsupervised Depth Estimation from Unstabilized Photography [54.36608424943729]
We show that in a ''long-burst'', forty-two 12-megapixel RAW frames captured in a two-second sequence, there is enough parallax information from natural hand tremor alone to recover high-quality scene depth. We devise a test-time optimization approach that fits a neural RGB-D representation to long-burst data and simultaneously estimates scene depth and camera motion.
arXiv Detail & Related papers (2022-12-22T18:54:34Z)
SwiftSRGAN -- Rethinking Super-Resolution for Efficient and Real-time Inference [0.0]
We present an architecture that is faster and smaller in terms of its memory footprint. A real-time super-resolution enables streaming high resolution media content even under poor bandwidth conditions.
arXiv Detail & Related papers (2021-11-29T04:20:15Z)
Robust High-Resolution Video Matting with Temporal Guidance [14.9739044990367]
We introduce a robust, real-time, high-resolution human video matting method that achieves new state-of-the-art performance. Our method is much lighter than previous approaches and can process 4K at 76 FPS and HD at 104 FPS on an Nvidia GTX 1080Ti GPU.
arXiv Detail & Related papers (2021-08-25T23:48:15Z)
Single image deep defocus estimation and its applications [82.93345261434943]
We train a deep neural network to classify image patches into one of the 20 levels of blurriness. The trained model is used to determine the patch blurriness which is then refined by applying an iterative weighted guided filter. The result is a defocus map that carries the information of the degree of blurriness for each pixel.
arXiv Detail & Related papers (2021-07-30T06:18:16Z)
Attention-guided Temporal Coherent Video Object Matting [78.82835351423383]
We propose a novel deep learning-based object matting method that can achieve temporally coherent matting results. Its key component is an attention-based temporal aggregation module that maximizes image matting networks' strength. We show how to effectively solve the trimap generation problem by fine-tuning a state-of-the-art video object segmentation network.
arXiv Detail & Related papers (2021-05-24T17:34:57Z)
Robust Consistent Video Depth Estimation [65.53308117778361]
We present an algorithm for estimating consistent dense depth maps and camera poses from a monocular video. Our algorithm combines two complementary techniques: (1) flexible deformation-splines for low-frequency large-scale alignment and (2) geometry-aware depth filtering for high-frequency alignment of fine depth details. In contrast to prior approaches, our method does not require camera poses as input and achieves robust reconstruction for challenging hand-held cell phone captures containing a significant amount of noise, shake, motion blur, and rolling shutter deformations.
arXiv Detail & Related papers (2020-12-10T18:59:48Z)
High-Resolution Deep Image Matting [39.72708676319803]
HDMatt is a first deep learning based image matting approach for high-resolution inputs. Our proposed method sets new state-of-the-art performance on Adobe Image Matting and AlphaMatting benchmarks.
arXiv Detail & Related papers (2020-09-14T17:53:15Z)
Contextual Residual Aggregation for Ultra High-Resolution Image Inpainting [12.839962012888199]
We propose a Contextual Residual Aggregation (CRA) mechanism that can produce high-frequency residuals for missing contents. CRA mechanism produces high-frequency residuals for missing contents by weighted aggregating residuals from contextual patches. We train the proposed model on small images with resolutions 512x512 and perform inference on high-resolution images, achieving compelling inpainting quality.
arXiv Detail & Related papers (2020-05-19T18:55:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.