DART: Depth-Enhanced Accurate and Real-Time Background Matting
- URL: http://arxiv.org/abs/2402.15820v1
- Date: Sat, 24 Feb 2024 14:10:17 GMT
- Title: DART: Depth-Enhanced Accurate and Real-Time Background Matting
- Authors: Hanxi Li, Guofeng Li, Bo Li, Lin Wu and Yan Cheng
- Abstract summary: Matting with a static background, often referred to as Background Matting" (BGM), has garnered significant attention within the computer vision community.
We leverage the rich depth information provided by the RGB-Depth (RGB-D) cameras to enhance background matting performance in real-time.
- Score: 11.78381754863757
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Matting with a static background, often referred to as ``Background Matting"
(BGM), has garnered significant attention within the computer vision community
due to its pivotal role in various practical applications like webcasting and
photo editing. Nevertheless, achieving highly accurate background matting
remains a formidable challenge, primarily owing to the limitations inherent in
conventional RGB images. These limitations manifest in the form of
susceptibility to varying lighting conditions and unforeseen shadows.
In this paper, we leverage the rich depth information provided by the
RGB-Depth (RGB-D) cameras to enhance background matting performance in
real-time, dubbed DART. Firstly, we adapt the original RGB-based BGM algorithm
to incorporate depth information. The resulting model's output undergoes
refinement through Bayesian inference, incorporating a background depth prior.
The posterior prediction is then translated into a "trimap," which is
subsequently fed into a state-of-the-art matting algorithm to generate more
precise alpha mattes. To ensure real-time matting capabilities, a critical
requirement for many real-world applications, we distill the backbone of our
model from a larger and more versatile BGM network. Our experiments demonstrate
the superior performance of the proposed method. Moreover, thanks to the
distillation operation, our method achieves a remarkable processing speed of 33
frames per second (fps) on a mid-range edge-computing device. This high
efficiency underscores DART's immense potential for deployment in mobile
applications}
Related papers
- Depth-based Privileged Information for Boosting 3D Human Pose Estimation on RGB [48.31210455404533]
Heatmap-based 3D pose estimator is able to hallucinate depth information from the RGB frames given at inference time.
depth information is used exclusively during training by enforcing our RGB-based hallucination network to learn similar features to a backbone pre-trained only on depth data.
arXiv Detail & Related papers (2024-09-17T11:59:34Z) - Scene Prior Filtering for Depth Super-Resolution [97.30137398361823]
We introduce a Scene Prior Filtering network, SPFNet, to mitigate texture interference and edge inaccuracy.
Our SPFNet has been extensively evaluated on both real and synthetic datasets, achieving state-of-the-art performance.
arXiv Detail & Related papers (2024-02-21T15:35:59Z) - AGG-Net: Attention Guided Gated-convolutional Network for Depth Image
Completion [1.8820731605557168]
We propose a new model for depth image completion based on the Attention Guided Gated-convolutional Network (AGG-Net)
In the encoding stage, an Attention Guided Gated-Convolution (AG-GConv) module is proposed to realize the fusion of depth and color features at different scales.
In the decoding stage, an Attention Guided Skip Connection (AG-SC) module is presented to avoid introducing too many depth-irrelevant features to the reconstruction.
arXiv Detail & Related papers (2023-09-04T14:16:08Z) - Symmetric Uncertainty-Aware Feature Transmission for Depth
Super-Resolution [52.582632746409665]
We propose a novel Symmetric Uncertainty-aware Feature Transmission (SUFT) for color-guided DSR.
Our method achieves superior performance compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-06-01T06:35:59Z) - Shakes on a Plane: Unsupervised Depth Estimation from Unstabilized
Photography [54.36608424943729]
We show that in a ''long-burst'', forty-two 12-megapixel RAW frames captured in a two-second sequence, there is enough parallax information from natural hand tremor alone to recover high-quality scene depth.
We devise a test-time optimization approach that fits a neural RGB-D representation to long-burst data and simultaneously estimates scene depth and camera motion.
arXiv Detail & Related papers (2022-12-22T18:54:34Z) - Consistent Depth Prediction under Various Illuminations using Dilated
Cross Attention [1.332560004325655]
We propose to use internet 3D indoor scenes and manually tune their illuminations to render photo-realistic RGB photos and their corresponding depth and BRDF maps.
We perform cross attention on these dilated features to retain the consistency of depth prediction under different illuminations.
Our method is evaluated by comparing it with current state-of-the-art methods on Vari dataset and a significant improvement is observed in experiments.
arXiv Detail & Related papers (2021-12-15T10:02:46Z) - Wild ToFu: Improving Range and Quality of Indirect Time-of-Flight Depth
with RGB Fusion in Challenging Environments [56.306567220448684]
We propose a new learning based end-to-end depth prediction network which takes noisy raw I-ToF signals as well as an RGB image.
We show more than 40% RMSE improvement on the final depth map compared to the baseline approach.
arXiv Detail & Related papers (2021-12-07T15:04:14Z) - Real-Time High-Resolution Background Matting [19.140664310700107]
We introduce a real-time, high-resolution background replacement technique which operates at 30 fps in 4K resolution, and 60 fps for HD on a modern GPU.
Our approach yields higher quality results compared to the previous state-of-the-art in background matting, while simultaneously yielding a dramatic boost in both speed and resolution.
arXiv Detail & Related papers (2020-12-14T18:43:32Z) - A Single Stream Network for Robust and Real-time RGB-D Salient Object
Detection [89.88222217065858]
We design a single stream network to use the depth map to guide early fusion and middle fusion between RGB and depth.
This model is 55.5% lighter than the current lightest model and runs at a real-time speed of 32 FPS when processing a $384 times 384$ image.
arXiv Detail & Related papers (2020-07-14T04:40:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.