MODNet: Real-Time Trimap-Free Portrait Matting via Objective
Decomposition
- URL: http://arxiv.org/abs/2011.11961v4
- Date: Fri, 18 Mar 2022 04:49:53 GMT
- Title: MODNet: Real-Time Trimap-Free Portrait Matting via Objective
Decomposition
- Authors: Zhanghan Ke, Jiayu Sun, Kaican Li, Qiong Yan, Rynson W.H. Lau
- Abstract summary: Existing portrait matting methods require auxiliary inputs that are costly to obtain or involve multiple stages that are computationally expensive.
We present a light-weight matting objective decomposition network (MODNet) for portrait matting in real-time with a single input image.
- Score: 39.60219801564855
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Existing portrait matting methods either require auxiliary inputs that are
costly to obtain or involve multiple stages that are computationally expensive,
making them less suitable for real-time applications. In this work, we present
a light-weight matting objective decomposition network (MODNet) for portrait
matting in real-time with a single input image. The key idea behind our
efficient design is by optimizing a series of sub-objectives simultaneously via
explicit constraints. In addition, MODNet includes two novel techniques for
improving model efficiency and robustness. First, an Efficient Atrous Spatial
Pyramid Pooling (e-ASPP) module is introduced to fuse multi-scale features for
semantic estimation. Second, a self-supervised sub-objectives consistency (SOC)
strategy is proposed to adapt MODNet to real-world data to address the domain
shift problem common to trimap-free methods. MODNet is easy to be trained in an
end-to-end manner. It is much faster than contemporaneous methods and runs at
67 frames per second on a 1080Ti GPU. Experiments show that MODNet outperforms
prior trimap-free methods by a large margin on both Adobe Matting Dataset and a
carefully designed photographic portrait matting (PPM-100) benchmark proposed
by us. Further, MODNet achieves remarkable results on daily photos and videos.
Our code and models are available at https://github.com/ZHKKKe/MODNet, and the
PPM-100 benchmark is released at https://github.com/ZHKKKe/PPM.
Related papers
- Dynamic Pre-training: Towards Efficient and Scalable All-in-One Image Restoration [100.54419875604721]
All-in-one image restoration tackles different types of degradations with a unified model instead of having task-specific, non-generic models for each degradation.
We propose DyNet, a dynamic family of networks designed in an encoder-decoder style for all-in-one image restoration tasks.
Our DyNet can seamlessly switch between its bulkier and lightweight variants, thereby offering flexibility for efficient model deployment.
arXiv Detail & Related papers (2024-04-02T17:58:49Z) - F$^3$Loc: Fusion and Filtering for Floorplan Localization [63.28504055661646]
We propose an efficient data-driven solution to self-localization within a floorplan.
Our method does not require retraining per map and location or demand a large database of images of the area of interest.
arXiv Detail & Related papers (2024-03-05T23:32:26Z) - RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation [46.659592045271125]
RTMO is a one-stage pose estimation framework that seamlessly integrates coordinate classification.
It achieves accuracy comparable to top-down methods while maintaining high speed.
Our largest model, RTMO-l, attains 74.8% AP on COCO val 2017 and 141 FPS on a single V100 GPU.
arXiv Detail & Related papers (2023-12-12T18:55:29Z) - MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks.
We propose a single-stage and standalone method, MOCA, which unifies both desired properties.
We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z) - Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D
Object Detection [20.161887223481994]
We propose a long-sequence modeling framework, named StreamPETR, for multi-view 3D object detection.
StreamPETR achieves significant performance improvements only with negligible cost, compared to the single-frame baseline.
The lightweight version realizes 45.0% mAP and 31.7 FPS, outperforming the state-of-the-art method (SOLOFusion) by 2.3% mAP and 1.8x faster FPS.
arXiv Detail & Related papers (2023-03-21T15:19:20Z) - Efficient Context Integration through Factorized Pyramidal Learning for
Ultra-Lightweight Semantic Segmentation [1.0499611180329804]
We propose a novel Factorized Pyramidal Learning (FPL) module to aggregate rich contextual information in an efficient manner.
We decompose the spatial pyramid into two stages which enables a simple and efficient feature fusion within the module to solve the notorious checkerboard effect.
Based on the FPL module and FIR unit, we propose an ultra-lightweight real-time network, called FPLNet, which achieves state-of-the-art accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-02-23T05:34:51Z) - Highly Efficient Natural Image Matting [15.977598189574659]
We propose a trimap-free natural image matting method with a lightweight model.
We construct an extremely light-weighted model, which achieves comparable performance with 1% (344k) of large models on popular natural image benchmarks.
arXiv Detail & Related papers (2021-10-25T09:23:46Z) - MVFNet: Multi-View Fusion Network for Efficient Video Recognition [79.92736306354576]
We introduce a multi-view fusion (MVF) module to exploit video complexity using separable convolution for efficiency.
MVFNet can be thought of as a generalized video modeling framework.
arXiv Detail & Related papers (2020-12-13T06:34:18Z) - Online Multi-Object Tracking and Segmentation with GMPHD Filter and
Mask-based Affinity Fusion [79.87371506464454]
We propose a fully online multi-object tracking and segmentation (MOTS) method that uses instance segmentation results as an input.
The proposed method is based on the Gaussian mixture probability hypothesis density (GMPHD) filter, a hierarchical data association (HDA), and a mask-based affinity fusion (MAF) model.
In the experiments on the two popular MOTS datasets, the key modules show some improvements.
arXiv Detail & Related papers (2020-08-31T21:06:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.