Multi-scale Information Assembly for Image Matting
- URL: http://arxiv.org/abs/2101.02391v2
- Date: Wed, 3 Mar 2021 11:06:54 GMT
- Title: Multi-scale Information Assembly for Image Matting
- Authors: Yu Qiao, Yuhao Liu, Qiang Zhu, Xin Yang, Yuxin Wang, Qiang Zhang, and
Xiaopeng Wei
- Abstract summary: We propose a multi-scale information assembly framework (MSIA-matte) to pull out high-quality alpha mattes from single RGB images.
We can achieve state-of-the-art performance compared to most existing matting networks.
- Score: 35.43994064645042
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image matting is a long-standing problem in computer graphics and vision,
mostly identified as the accurate estimation of the foreground in input images.
We argue that the foreground objects can be represented by different-level
information, including the central bodies, large-grained boundaries, refined
details, etc. Based on this observation, in this paper, we propose a
multi-scale information assembly framework (MSIA-matte) to pull out
high-quality alpha mattes from single RGB images. Technically speaking, given
an input image, we extract advanced semantics as our subject content and retain
initial CNN features to encode different-level foreground expression, then
combine them by our well-designed information assembly strategy. Extensive
experiments can prove the effectiveness of the proposed MSIA-matte, and we can
achieve state-of-the-art performance compared to most existing matting
networks.
Related papers
- Multi-Head Attention Residual Unfolded Network for Model-Based Pansharpening [2.874893537471256]
Unfolding fusion methods integrate the powerful representation capabilities of deep learning with the robustness of model-based approaches.
In this paper, we propose a model-based deep unfolded method for satellite image fusion.
Experimental results on PRISMA, Quickbird, and WorldView2 datasets demonstrate the superior performance of our method.
arXiv Detail & Related papers (2024-09-04T13:05:00Z) - DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception [43.387076189063556]
High-quality image-text datasets offer diverse visual elements and throughout image descriptions.
Current caption engines fall short in providing complete and accurate annotations.
We propose Perceptual Fusion, using a low-budget but highly effective caption engine for complete and accurate image descriptions.
arXiv Detail & Related papers (2024-07-11T08:48:06Z) - MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training [103.72844619581811]
We build performant Multimodal Large Language Models (MLLMs)
In particular, we study the importance of various architecture components and data choices.
We demonstrate that for large-scale multimodal pre-training using a careful mix of image-caption, interleaved image-text, and text-only data.
arXiv Detail & Related papers (2024-03-14T17:51:32Z) - Generalizable Entity Grounding via Assistance of Large Language Model [77.07759442298666]
We propose a novel approach to densely ground visual entities from a long caption.
We leverage a large multimodal model to extract semantic nouns, a class-a segmentation model to generate entity-level segmentation, and a multi-modal feature fusion module to associate each semantic noun with its corresponding segmentation mask.
arXiv Detail & Related papers (2024-02-04T16:06:05Z) - PROMPT-IML: Image Manipulation Localization with Pre-trained Foundation
Models Through Prompt Tuning [35.39822183728463]
We present a novel Prompt-IML framework for detecting tampered images.
Humans tend to discern authenticity of an image based on semantic and high-frequency information.
Our model can achieve better performance on eight typical fake image datasets.
arXiv Detail & Related papers (2024-01-01T03:45:07Z) - CtxMIM: Context-Enhanced Masked Image Modeling for Remote Sensing Image Understanding [38.53988682814626]
We propose a context-enhanced masked image modeling method (CtxMIM) for remote sensing image understanding.
CtxMIM formulates original image patches as a reconstructive template and employs a Siamese framework to operate on two sets of image patches.
With the simple and elegant design, CtxMIM encourages the pre-training model to learn object-level or pixel-level features on a large-scale dataset.
arXiv Detail & Related papers (2023-09-28T18:04:43Z) - High-Quality Entity Segmentation [110.55724145851725]
CropFormer is designed to tackle the intractability of instance-level segmentation on high-resolution images.
It improves mask prediction by fusing high-res image crops that provide more fine-grained image details and the full image.
With CropFormer, we achieve a significant AP gain of $1.9$ on the challenging entity segmentation task.
arXiv Detail & Related papers (2022-11-10T18:58:22Z) - Hierarchical and Progressive Image Matting [40.291998690687514]
We propose an end-to-end Hierarchical and Progressive Attention Matting Network (HAttMatting++)
It can better predict the opacity of the foreground from single RGB images without additional input.
We construct a large-scale and challenging image matting dataset comprised of 59, 600 training images and 1000 test images.
arXiv Detail & Related papers (2022-10-13T11:16:49Z) - Modeling Image Composition for Complex Scene Generation [77.10533862854706]
We present a method that achieves state-of-the-art results on layout-to-image generation tasks.
After compressing RGB images into patch tokens, we propose the Transformer with Focal Attention (TwFA) for exploring dependencies of object-to-object, object-to-patch and patch-to-patch.
arXiv Detail & Related papers (2022-06-02T08:34:25Z) - Bridging Composite and Real: Towards End-to-end Deep Image Matting [88.79857806542006]
We study the roles of semantics and details for image matting.
We propose a novel Glance and Focus Matting network (GFM), which employs a shared encoder and two separate decoders.
Comprehensive empirical studies have demonstrated that GFM outperforms state-of-the-art methods.
arXiv Detail & Related papers (2020-10-30T10:57:13Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.