Dual-Context Aggregation for Universal Image Matting
- URL: http://arxiv.org/abs/2402.18109v1
- Date: Wed, 28 Feb 2024 06:56:24 GMT
- Title: Dual-Context Aggregation for Universal Image Matting
- Authors: Qinglin Liu, Xiaoqian Lv, Wei Yu, Changyong Guo, Shengping Zhang
- Abstract summary: We propose a simple and universal matting framework, named Dual-Context Aggregation Matting (DCAM)
Specifically, DCAM first adopts a semantic backbone network to extract low-level features and context features from the input image and guidance.
By performing both global contour segmentation and local boundary refinement, DCAM exhibits robustness to diverse types of guidance and objects.
- Score: 16.59886660634162
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Natural image matting aims to estimate the alpha matte of the foreground from
a given image. Various approaches have been explored to address this problem,
such as interactive matting methods that use guidance such as click or trimap,
and automatic matting methods tailored to specific objects. However, existing
matting methods are designed for specific objects or guidance, neglecting the
common requirement of aggregating global and local contexts in image matting.
As a result, these methods often encounter challenges in accurately identifying
the foreground and generating precise boundaries, which limits their
effectiveness in unforeseen scenarios. In this paper, we propose a simple and
universal matting framework, named Dual-Context Aggregation Matting (DCAM),
which enables robust image matting with arbitrary guidance or without guidance.
Specifically, DCAM first adopts a semantic backbone network to extract
low-level features and context features from the input image and guidance.
Then, we introduce a dual-context aggregation network that incorporates global
object aggregators and local appearance aggregators to iteratively refine the
extracted context features. By performing both global contour segmentation and
local boundary refinement, DCAM exhibits robustness to diverse types of
guidance and objects. Finally, we adopt a matting decoder network to fuse the
low-level features and the refined context features for alpha matte estimation.
Experimental results on five matting datasets demonstrate that the proposed
DCAM outperforms state-of-the-art matting methods in both automatic matting and
interactive matting tasks, which highlights the strong universality and high
performance of DCAM. The source code is available at
\url{https://github.com/Windaway/DCAM}.
Related papers
- Towards Natural Image Matting in the Wild via Real-Scenario Prior [69.96414467916863]
We propose a new matting dataset based on the COCO dataset, namely COCO-Matting.
The built COCO-Matting comprises an extensive collection of 38,251 human instance-level alpha mattes in complex natural scenarios.
For network architecture, the proposed feature-aligned transformer learns to extract fine-grained edge and transparency features.
The proposed matte-aligned decoder aims to segment matting-specific objects and convert coarse masks into high-precision mattes.
arXiv Detail & Related papers (2024-10-09T06:43:19Z) - Adapt CLIP as Aggregation Instructor for Image Dehazing [17.29370328189668]
Most dehazing methods suffer from limited receptive field and do not explore the rich semantic prior encapsulated in vision-language models.
We introduce CLIPHaze, a pioneering hybrid framework that synergizes the efficient global modeling of Mamba with the prior knowledge and zero-shot capabilities of CLIP.
Our method employs parallel state space model and window-based self-attention to obtain global contextual dependency and local fine-grained perception.
arXiv Detail & Related papers (2024-08-22T11:51:50Z) - Mesh Denoising Transformer [104.5404564075393]
Mesh denoising is aimed at removing noise from input meshes while preserving their feature structures.
SurfaceFormer is a pioneering Transformer-based mesh denoising framework.
New representation known as Local Surface Descriptor captures local geometric intricacies.
Denoising Transformer module receives the multimodal information and achieves efficient global feature aggregation.
arXiv Detail & Related papers (2024-05-10T15:27:43Z) - Mutual-Guided Dynamic Network for Image Fusion [51.615598671899335]
We propose a novel mutual-guided dynamic network (MGDN) for image fusion, which allows for effective information utilization across different locations and inputs.
Experimental results on five benchmark datasets demonstrate that our proposed method outperforms existing methods on four image fusion tasks.
arXiv Detail & Related papers (2023-08-24T03:50:37Z) - Object Segmentation by Mining Cross-Modal Semantics [68.88086621181628]
We propose a novel approach by mining the Cross-Modal Semantics to guide the fusion and decoding of multimodal features.
Specifically, we propose a novel network, termed XMSNet, consisting of (1) all-round attentive fusion (AF), (2) coarse-to-fine decoder (CFD), and (3) cross-layer self-supervision.
arXiv Detail & Related papers (2023-05-17T14:30:11Z) - Deep Image Matting: A Comprehensive Survey [85.77905619102802]
This paper presents a review of recent advancements in image matting in the era of deep learning.
We focus on two fundamental sub-tasks: auxiliary input-based image matting and automatic image matting.
We discuss relevant applications of image matting and highlight existing challenges and potential opportunities for future research.
arXiv Detail & Related papers (2023-04-10T15:48:55Z) - Exploring the Interactive Guidance for Unified and Effective Image
Matting [16.933897631478146]
We propose a Unified Interactive image Matting method, named UIM, which solves the limitations and achieves satisfying matting results.
Specifically, UIM leverages multiple types of user interaction to avoid the ambiguity of multiple matting targets.
We show that UIM achieves state-of-the-art performance on the Composition-1K test set and a synthetic unified dataset.
arXiv Detail & Related papers (2022-05-17T13:20:30Z) - Situational Perception Guided Image Matting [16.1897179939677]
We propose a Situational Perception Guided Image Matting (SPG-IM) method that mitigates subjective bias of matting annotations.
SPG-IM can better associate inter-objects and object-to-environment saliency, and compensate the subjective nature of image matting.
arXiv Detail & Related papers (2022-04-20T07:35:51Z) - Bridging Composite and Real: Towards End-to-end Deep Image Matting [88.79857806542006]
We study the roles of semantics and details for image matting.
We propose a novel Glance and Focus Matting network (GFM), which employs a shared encoder and two separate decoders.
Comprehensive empirical studies have demonstrated that GFM outperforms state-of-the-art methods.
arXiv Detail & Related papers (2020-10-30T10:57:13Z) - AlphaNet: An Attention Guided Deep Network for Automatic Image Matting [0.0]
We propose an end to end solution for image matting i.e. high-precision extraction of foreground objects from natural images.
We propose a method that assimilates semantic segmentation and deep image matting processes into a single network to generate semantic mattes.
We also construct a fashion e-commerce focused dataset with high-quality alpha mattes to facilitate the training and evaluation for image matting.
arXiv Detail & Related papers (2020-03-07T17:25:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.