Exploring the Interactive Guidance for Unified and Effective Image
Matting
- URL: http://arxiv.org/abs/2205.08324v3
- Date: Thu, 7 Dec 2023 10:08:04 GMT
- Title: Exploring the Interactive Guidance for Unified and Effective Image
Matting
- Authors: Dinghao Yang, Bin Wang, Weijia Li, Yiqi Lin, Conghui He
- Abstract summary: We propose a Unified Interactive image Matting method, named UIM, which solves the limitations and achieves satisfying matting results.
Specifically, UIM leverages multiple types of user interaction to avoid the ambiguity of multiple matting targets.
We show that UIM achieves state-of-the-art performance on the Composition-1K test set and a synthetic unified dataset.
- Score: 16.933897631478146
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent image matting studies are developing towards proposing trimap-free or
interactive methods for complete complex image matting tasks. Although avoiding
the extensive labors of trimap annotation, existing methods still suffer from
two limitations: (1) For the single image with multiple objects, it is
essential to provide extra interaction information to help determining the
matting target; (2) For transparent objects, the accurate regression of alpha
matte from RGB image is much more difficult compared with the opaque ones. In
this work, we propose a Unified Interactive image Matting method, named UIM,
which solves the limitations and achieves satisfying matting results for any
scenario. Specifically, UIM leverages multiple types of user interaction to
avoid the ambiguity of multiple matting targets, and we compare the pros and
cons of different annotation types in detail. To unify the matting performance
for transparent and opaque objects, we decouple image matting into two stages,
i.e., foreground segmentation and transparency prediction. Moreover, we design
a multi-scale attentive fusion module to alleviate the vagueness in the
boundary region. Experimental results demonstrate that UIM achieves
state-of-the-art performance on the Composition-1K test set and a synthetic
unified dataset.
Related papers
- HSVLT: Hierarchical Scale-Aware Vision-Language Transformer for Multi-Label Image Classification [15.129037250680582]
Tight visual-linguistic interactions play a vital role in improving classification performance.
Recent Transformer-based methods have achieved great success in multi-label image classification.
We propose a Hierarchical Scale-Aware Vision-Language Transformer (HSVLT) with two appealing designs.
arXiv Detail & Related papers (2024-07-23T07:31:42Z) - Towards Self-Supervised FG-SBIR with Unified Sample Feature Alignment and Multi-Scale Token Recycling [11.129453244307369]
FG-SBIR aims to minimize the distance between sketches and corresponding images in the embedding space.
We propose an effective approach to narrow the gap between the two domains.
It mainly facilitates unified mutual information sharing both intra- and inter-samples.
arXiv Detail & Related papers (2024-06-17T13:49:12Z) - From Text to Pixels: A Context-Aware Semantic Synergy Solution for
Infrared and Visible Image Fusion [66.33467192279514]
We introduce a text-guided multi-modality image fusion method that leverages the high-level semantics from textual descriptions to integrate semantics from infrared and visible images.
Our method not only produces visually superior fusion results but also achieves a higher detection mAP over existing methods, achieving state-of-the-art results.
arXiv Detail & Related papers (2023-12-31T08:13:47Z) - Improving Human-Object Interaction Detection via Virtual Image Learning [68.56682347374422]
Human-Object Interaction (HOI) detection aims to understand the interactions between humans and objects.
In this paper, we propose to alleviate the impact of such an unbalanced distribution via Virtual Image Leaning (VIL)
A novel label-to-image approach, Multiple Steps Image Creation (MUSIC), is proposed to create a high-quality dataset that has a consistent distribution with real images.
arXiv Detail & Related papers (2023-08-04T10:28:48Z) - Multi-interactive Feature Learning and a Full-time Multi-modality
Benchmark for Image Fusion and Segmentation [66.15246197473897]
Multi-modality image fusion and segmentation play a vital role in autonomous driving and robotic operation.
We propose a textbfMulti-textbfinteractive textbfFeature learning architecture for image fusion and textbfSegmentation.
arXiv Detail & Related papers (2023-08-04T01:03:58Z) - Multi-Grained Multimodal Interaction Network for Entity Linking [65.30260033700338]
Multimodal entity linking task aims at resolving ambiguous mentions to a multimodal knowledge graph.
We propose a novel Multi-GraIned Multimodal InteraCtion Network $textbf(MIMIC)$ framework for solving the MEL task.
arXiv Detail & Related papers (2023-07-19T02:11:19Z) - Bi-level Dynamic Learning for Jointly Multi-modality Image Fusion and
Beyond [50.556961575275345]
We build an image fusion module to fuse complementary characteristics and cascade dual task-related modules.
We develop an efficient first-order approximation to compute corresponding gradients and present dynamic weighted aggregation to balance the gradients for fusion learning.
arXiv Detail & Related papers (2023-05-11T10:55:34Z) - Unsupervised Image Fusion Method based on Feature Mutual Mapping [16.64607158983448]
We propose an unsupervised adaptive image fusion method to address the above issues.
We construct a global map to measure the connections of pixels between the input source images.
Our method achieves superior performance in both visual perception and objective evaluation.
arXiv Detail & Related papers (2022-01-25T07:50:14Z) - Salient Image Matting [0.0]
We propose an image matting framework called Salient Image Matting to estimate the per-pixel opacity value of the most salient foreground in an image.
Our framework simultaneously deals with the challenge of learning a wide range of semantics and salient object types.
Our framework requires only a fraction of expensive matting data as compared to other automatic methods.
arXiv Detail & Related papers (2021-03-23T06:22:33Z) - Bridging Composite and Real: Towards End-to-end Deep Image Matting [88.79857806542006]
We study the roles of semantics and details for image matting.
We propose a novel Glance and Focus Matting network (GFM), which employs a shared encoder and two separate decoders.
Comprehensive empirical studies have demonstrated that GFM outperforms state-of-the-art methods.
arXiv Detail & Related papers (2020-10-30T10:57:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.