EFormer: Enhanced Transformer towards Semantic-Contour Features of
Foreground for Portraits Matting
- URL: http://arxiv.org/abs/2308.12831v2
- Date: Thu, 30 Nov 2023 08:59:34 GMT
- Title: EFormer: Enhanced Transformer towards Semantic-Contour Features of
Foreground for Portraits Matting
- Authors: Zitao Wang and Qiguang Miao and Peipei Zhao and Yue Xi
- Abstract summary: We propose EFormer to enhance the model's attention towards both of the low-frequency semantic and high-frequency contour features.
We build a semantic and contour detector (SCD) to accurately capture both of the low-frequency semantic and high-frequency contour features.
And we design contour-edge extraction branch and semantic extraction branch to extract refined high-frequency contour features and complete low-frequency semantic information.
- Score: 6.468859319728341
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The portrait matting task aims to extract an alpha matte with complete
semantics and finely-detailed contours. In comparison to CNN-based approaches,
transformers with self-attention module have a better capacity to capture
long-range dependencies and low-frequency semantic information of a portrait.
However, the recent research shows that self-attention mechanism struggles with
modeling high-frequency contour information and capturing fine contour details,
which can lead to bias while predicting the portrait's contours. To deal with
this issue, we propose EFormer to enhance the model's attention towards both of
the low-frequency semantic and high-frequency contour features. For the
high-frequency contours, our research demonstrates that cross-attention module
between different resolutions can guide our model to allocate attention
appropriately to these contour regions. Supported on this, we can successfully
extract the high-frequency detail information around the portrait's contours,
which are previously ignored by self-attention. Based on cross-attention
module, we further build a semantic and contour detector (SCD) to accurately
capture both of the low-frequency semantic and high-frequency contour features.
And we design contour-edge extraction branch and semantic extraction branch to
extract refined high-frequency contour features and complete low-frequency
semantic information, respectively. Finally, we fuse the two kinds of features
and leverage segmentation head to generate a predicted portrait matte.
Experiments on VideoMatte240K (JPEG SD Format) and Adobe Image Matting (AIM)
datasets demonstrate that EFormer outperforms previous portrait matte methods.
Related papers
- High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity [69.32473738284374]
We propose DiffDIS, a diffusion-driven segmentation model that taps into the potential of the pre-trained U-Net within diffusion models.
By leveraging the robust generalization capabilities and rich, versatile image representation prior to the SD models, we significantly reduce the inference time while preserving high-fidelity, detailed generation.
Experiments on the DIS5K dataset demonstrate the superiority of DiffDIS, achieving state-of-the-art results through a streamlined inference process.
arXiv Detail & Related papers (2024-10-14T02:49:23Z) - ZePo: Zero-Shot Portrait Stylization with Faster Sampling [61.14140480095604]
This paper presents an inversion-free portrait stylization framework based on diffusion models that accomplishes content and style feature fusion in merely four sampling steps.
We propose a feature merging strategy to amalgamate redundant features in Consistency Features, thereby reducing the computational load of attention control.
arXiv Detail & Related papers (2024-08-10T08:53:41Z) - DiffPortrait3D: Controllable Diffusion for Zero-Shot Portrait View Synthesis [18.64688172651478]
We present DiffPortrait3D, a conditional diffusion model capable of synthesizing 3D-consistent photo-realistic novel views.
Given a single RGB input, we aim to synthesize plausible but consistent facial details rendered from novel camera views.
We demonstrate state-of-the-art results both qualitatively and quantitatively on our challenging in-the-wild and multi-view benchmarks.
arXiv Detail & Related papers (2023-12-20T13:31:11Z) - DiffusionMat: Alpha Matting as Sequential Refinement Learning [87.76572845943929]
DiffusionMat is an image matting framework that employs a diffusion model for the transition from coarse to refined alpha mattes.
A correction module adjusts the output at each denoising step, ensuring that the final result is consistent with the input image's structures.
We evaluate our model across several image matting benchmarks, and the results indicate that DiffusionMat consistently outperforms existing methods.
arXiv Detail & Related papers (2023-11-22T17:16:44Z) - Cross-Image Attention for Zero-Shot Appearance Transfer [68.43651329067393]
We introduce a cross-image attention mechanism that implicitly establishes semantic correspondences across images.
We harness three mechanisms that either manipulate the noisy latent codes or the model's internal representations throughout the denoising process.
Experiments show that our method is effective across a wide range of object categories and is robust to variations in shape, size, and viewpoint.
arXiv Detail & Related papers (2023-11-06T18:33:24Z) - Multitask AET with Orthogonal Tangent Regularity for Dark Object
Detection [84.52197307286681]
We propose a novel multitask auto encoding transformation (MAET) model to enhance object detection in a dark environment.
In a self-supervision manner, the MAET learns the intrinsic visual structure by encoding and decoding the realistic illumination-degrading transformation.
We have achieved the state-of-the-art performance using synthetic and real-world datasets.
arXiv Detail & Related papers (2022-05-06T16:27:14Z) - Towards Enhancing Fine-grained Details for Image Matting [40.17208660790402]
We argue that recovering microscopic details relies on low-level but high-definition texture features.
Our model consists of a conventional encoder-decoder Semantic Path and an independent down-sampling-free Textural Compensate Path.
Our method outperforms previous start-of-the-art methods on the Composition-1k dataset.
arXiv Detail & Related papers (2021-01-22T13:20:23Z) - Portrait Neural Radiance Fields from a Single Image [68.66958204066721]
We present a method for estimating Neural Radiance Fields (NeRF) from a single portrait.
We propose to pretrain the weights of a multilayer perceptron (MLP), which implicitly models the volumetric density.
To improve the generalization to unseen faces, we train the canonical coordinate space approximated by 3D face morphable models.
We quantitatively evaluate the method using controlled captures and demonstrate the generalization to real portrait images, showing favorable results against state-of-the-arts.
arXiv Detail & Related papers (2020-12-10T18:59:59Z) - AlphaNet: An Attention Guided Deep Network for Automatic Image Matting [0.0]
We propose an end to end solution for image matting i.e. high-precision extraction of foreground objects from natural images.
We propose a method that assimilates semantic segmentation and deep image matting processes into a single network to generate semantic mattes.
We also construct a fashion e-commerce focused dataset with high-quality alpha mattes to facilitate the training and evaluation for image matting.
arXiv Detail & Related papers (2020-03-07T17:25:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.