Referring Image Matting
- URL: http://arxiv.org/abs/2206.05149v3
- Date: Wed, 22 Mar 2023 03:47:41 GMT
- Title: Referring Image Matting
- Authors: Jizhizi Li, Jing Zhang, Dacheng Tao
- Abstract summary: We introduce a new task named Referring Image Matting (RIM) in this paper.
RIM aims to extract the meticulous alpha matte of the specific object that best matches the given natural language description.
RefMatte consists of 230 object categories, 47,500 images, 118,749 expression-region entities, and 474,996 expressions.
- Score: 85.77905619102802
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Different from conventional image matting, which either requires user-defined
scribbles/trimap to extract a specific foreground object or directly extracts
all the foreground objects in the image indiscriminately, we introduce a new
task named Referring Image Matting (RIM) in this paper, which aims to extract
the meticulous alpha matte of the specific object that best matches the given
natural language description, thus enabling a more natural and simpler
instruction for image matting. First, we establish a large-scale challenging
dataset RefMatte by designing a comprehensive image composition and expression
generation engine to automatically produce high-quality images along with
diverse text attributes based on public datasets. RefMatte consists of 230
object categories, 47,500 images, 118,749 expression-region entities, and
474,996 expressions. Additionally, we construct a real-world test set with 100
high-resolution natural images and manually annotate complex phrases to
evaluate the out-of-domain generalization abilities of RIM methods.
Furthermore, we present a novel baseline method CLIPMat for RIM, including a
context-embedded prompt, a text-driven semantic pop-up, and a multi-level
details extractor. Extensive experiments on RefMatte in both keyword and
expression settings validate the superiority of CLIPMat over representative
methods. We hope this work could provide novel insights into image matting and
encourage more follow-up studies. The dataset, code and models are available at
https://github.com/JizhiziLi/RIM.
Related papers
- Exploring Fine-Grained Image-Text Alignment for Referring Remote Sensing Image Segmentation [27.95875467352853]
We propose a new referring remote sensing image segmentation method, FIANet, that fully exploits the visual and linguistic representations.
The proposed fine-grained image-text alignment module (FIAM) would simultaneously leverage the features of the input image and the corresponding texts.
We evaluate the effectiveness of the proposed methods on two public referring remote sensing datasets including RefSegRS and RRSIS-D.
arXiv Detail & Related papers (2024-09-20T16:45:32Z) - MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation [54.64194935409982]
We introduce MuLAn: a novel dataset comprising over 44K MUlti-Layer-wise RGBA decompositions.
MuLAn is the first photorealistic resource providing instance decomposition and spatial information for high quality images.
We aim to encourage the development of novel generation and editing technology, in particular layer-wise solutions.
arXiv Detail & Related papers (2024-04-03T14:58:00Z) - Locate, Assign, Refine: Taming Customized Image Inpainting with Text-Subject Guidance [17.251982243534144]
LAR-Gen is a novel approach for image inpainting that enables seamless inpainting of masked scene images.
Our approach adopts a coarse-to-fine manner to ensure subject identity preservation and local semantic coherence.
Experiments and varied application scenarios demonstrate the superiority of LAR-Gen in terms of both identity preservation and text semantic consistency.
arXiv Detail & Related papers (2024-03-28T16:07:55Z) - CtxMIM: Context-Enhanced Masked Image Modeling for Remote Sensing Image Understanding [38.53988682814626]
We propose a context-enhanced masked image modeling method (CtxMIM) for remote sensing image understanding.
CtxMIM formulates original image patches as a reconstructive template and employs a Siamese framework to operate on two sets of image patches.
With the simple and elegant design, CtxMIM encourages the pre-training model to learn object-level or pixel-level features on a large-scale dataset.
arXiv Detail & Related papers (2023-09-28T18:04:43Z) - High-Quality Entity Segmentation [110.55724145851725]
CropFormer is designed to tackle the intractability of instance-level segmentation on high-resolution images.
It improves mask prediction by fusing high-res image crops that provide more fine-grained image details and the full image.
With CropFormer, we achieve a significant AP gain of $1.9$ on the challenging entity segmentation task.
arXiv Detail & Related papers (2022-11-10T18:58:22Z) - Context-Aware Image Inpainting with Learned Semantic Priors [100.99543516733341]
We introduce pretext tasks that are semantically meaningful to estimating the missing contents.
We propose a context-aware image inpainting model, which adaptively integrates global semantics and local features.
arXiv Detail & Related papers (2021-06-14T08:09:43Z) - Text-to-Image Generation Grounded by Fine-Grained User Attention [62.94737811887098]
Localized Narratives is a dataset with detailed natural language descriptions of images paired with mouse traces.
We propose TReCS, a sequential model that exploits this grounding to generate images.
arXiv Detail & Related papers (2020-11-07T13:23:31Z) - Bridging Composite and Real: Towards End-to-end Deep Image Matting [88.79857806542006]
We study the roles of semantics and details for image matting.
We propose a novel Glance and Focus Matting network (GFM), which employs a shared encoder and two separate decoders.
Comprehensive empirical studies have demonstrated that GFM outperforms state-of-the-art methods.
arXiv Detail & Related papers (2020-10-30T10:57:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.