Wider and Higher: Intensive Integration and Global Foreground Perception
for Image Matting
- URL: http://arxiv.org/abs/2210.06919v1
- Date: Thu, 13 Oct 2022 11:34:46 GMT
- Title: Wider and Higher: Intensive Integration and Global Foreground Perception
for Image Matting
- Authors: Yu Qiao, Ziqi Wei, Yuhao Liu, Yuxin Wang, Dongsheng Zhou, Qiang Zhang,
Xin Yang
- Abstract summary: This paper reviews recent deep-learning-based matting research and conceives our wider and higher motivation for image matting.
Image matting is essentially a pixel-wise regression, and the ideal situation is to perceive the maximum opacity from the input image.
We propose an Intensive Integration and Global Foreground Perception network (I2GFP) to integrate wider and higher feature streams.
- Score: 44.51635913732913
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper reviews recent deep-learning-based matting research and conceives
our wider and higher motivation for image matting. Many approaches achieve
alpha mattes with complex encoders to extract robust semantics, then resort to
the U-net-like decoder to concatenate or fuse encoder features. However, image
matting is essentially a pixel-wise regression, and the ideal situation is to
perceive the maximum opacity correspondence from the input image. In this
paper, we argue that the high-resolution feature representation, perception and
communication are more crucial for matting accuracy. Therefore, we propose an
Intensive Integration and Global Foreground Perception network (I2GFP) to
integrate wider and higher feature streams. Wider means we combine intensive
features in each decoder stage, while higher suggests we retain high-resolution
intermediate features and perceive large-scale foreground appearance. Our
motivation sacrifices model depth for a significant performance promotion. We
perform extensive experiments to prove the proposed I2GFP model, and
state-of-the-art results can be achieved on different public datasets.
Related papers
- High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity [69.32473738284374]
We propose DiffDIS, a diffusion-driven segmentation model that taps into the potential of the pre-trained U-Net within diffusion models.
By leveraging the robust generalization capabilities and rich, versatile image representation prior to the SD models, we significantly reduce the inference time while preserving high-fidelity, detailed generation.
Experiments on the DIS5K dataset demonstrate the superiority of DiffDIS, achieving state-of-the-art results through a streamlined inference process.
arXiv Detail & Related papers (2024-10-14T02:49:23Z) - A Semantic-Aware and Multi-Guided Network for Infrared-Visible Image Fusion [41.34335755315773]
Multi-modality image fusion aims at fusing specific-modality and shared-modality information from two source images.
We propose a three-branch encoder-decoder architecture along with corresponding fusion layers as the fusion strategy.
Our method has obtained competitive results compared with state-of-the-art methods in visible/infrared image fusion and medical image fusion tasks.
arXiv Detail & Related papers (2024-06-11T09:32:40Z) - MaeFuse: Transferring Omni Features with Pretrained Masked Autoencoders for Infrared and Visible Image Fusion via Guided Training [57.18758272617101]
MaeFuse is a novel autoencoder model designed for infrared and visible image fusion (IVIF)
Our model utilizes a pretrained encoder from Masked Autoencoders (MAE), which facilities the omni features extraction for low-level reconstruction and high-level vision tasks.
MaeFuse not only introduces a novel perspective in the realm of fusion techniques but also stands out with impressive performance across various public datasets.
arXiv Detail & Related papers (2024-04-17T02:47:39Z) - Multi-view Aggregation Network for Dichotomous Image Segmentation [76.75904424539543]
Dichotomous Image (DIS) has recently emerged towards high-precision object segmentation from high-resolution natural images.
Existing methods rely on tedious multiple encoder-decoder streams and stages to gradually complete the global localization and local refinement.
Inspired by it, we model DIS as a multi-view object perception problem and provide a parsimonious multi-view aggregation network (MVANet)
Experiments on the popular DIS-5K dataset show that our MVANet significantly outperforms state-of-the-art methods in both accuracy and speed.
arXiv Detail & Related papers (2024-04-11T03:00:00Z) - High Fidelity Image Synthesis With Deep VAEs In Latent Space [0.0]
We present fast, realistic image generation on high-resolution, multimodal datasets using hierarchical variational autoencoders (VAEs)
In this two-stage setup, the autoencoder compresses the image into its semantic features, which are then modeled with a deep VAE.
We demonstrate the effectiveness of our two-stage approach, achieving a FID of 9.34 on the ImageNet-256 dataset which is comparable to BigGAN.
arXiv Detail & Related papers (2023-03-23T23:45:19Z) - Learning Enriched Features for Fast Image Restoration and Enhancement [166.17296369600774]
This paper presents a holistic goal of maintaining spatially-precise high-resolution representations through the entire network.
We learn an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
Our approach achieves state-of-the-art results for a variety of image processing tasks, including defocus deblurring, image denoising, super-resolution, and image enhancement.
arXiv Detail & Related papers (2022-04-19T17:59:45Z) - Global-Local Path Networks for Monocular Depth Estimation with Vertical
CutDepth [24.897377434844266]
We propose a novel structure and training strategy for monocular depth estimation.
We deploy a hierarchical transformer encoder to capture and convey the global context, and design a lightweight yet powerful decoder.
Our network achieves state-of-the-art performance over the challenging depth dataset NYU Depth V2.
arXiv Detail & Related papers (2022-01-19T06:37:21Z) - Contrastive Attention Network with Dense Field Estimation for Face
Completion [11.631559190975034]
We propose a self-supervised Siamese inference network to improve the generalization and robustness of encoders.
To deal with geometric variations of face images, a dense correspondence field is integrated into the network.
This multi-scale architecture is beneficial for the decoder to utilize discriminative representations learned from encoders into images.
arXiv Detail & Related papers (2021-12-20T02:54:38Z) - HR-Depth: High Resolution Self-Supervised Monocular Depth Estimation [14.81943833870932]
We present an improvedDepthNet, HR-Depth, with two effective strategies.
Using Resnet-18 as the encoder, HR-Depth surpasses all pre-vious state-of-the-art(SoTA) methods with the least param-eters at both high and low resolution.
arXiv Detail & Related papers (2020-12-14T09:15:15Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.