Semi-parametric Makeup Transfer via Semantic-aware Correspondence
- URL: http://arxiv.org/abs/2203.02286v1
- Date: Fri, 4 Mar 2022 12:54:19 GMT
- Title: Semi-parametric Makeup Transfer via Semantic-aware Correspondence
- Authors: Mingrui Zhu, Yun Yi, Nannan Wang, Xiaoyu Wang, Xinbo Gao
- Abstract summary: Large discrepancy between source non-makeup image and reference makeup image is one of key challenges in makeup transfer.
Non-parametric techniques have a high potential for addressing the pose, expression, and occlusion discrepancies.
We propose a textbfSemi-textbfparametric textbfMakeup textbfTransfer (SpMT) method, which combines the reciprocal strengths of non-parametric and parametric mechanisms.
- Score: 99.02329132102098
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The large discrepancy between the source non-makeup image and the reference
makeup image is one of the key challenges in makeup transfer. Conventional
approaches for makeup transfer either learn disentangled representation or
perform pixel-wise correspondence in a parametric way between two images. We
argue that non-parametric techniques have a high potential for addressing the
pose, expression, and occlusion discrepancies. To this end, this paper proposes
a \textbf{S}emi-\textbf{p}arametric \textbf{M}akeup \textbf{T}ransfer (SpMT)
method, which combines the reciprocal strengths of non-parametric and
parametric mechanisms. The non-parametric component is a novel
\textbf{S}emantic-\textbf{a}ware \textbf{C}orrespondence (SaC) module that
explicitly reconstructs content representation with makeup representation under
the strong constraint of component semantics. The reconstructed representation
is desired to preserve the spatial and identity information of the source image
while "wearing" the makeup of the reference image. The output image is
synthesized via a parametric decoder that draws on the reconstructed
representation. Extensive experiments demonstrate the superiority of our method
in terms of visual quality, robustness, and flexibility. Code and pre-trained
model are available at \url{https://github.com/AnonymScholar/SpMT.
Related papers
- Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition [43.61569815081384]
We propose Symmetric Superimposition Modeling to simultaneously capture local character features and linguistic information in text images.
At the pixel level, we reconstruct the original and inverted images to capture character shapes and texture-level linguistic context.
At the feature level, we reconstruct the feature of the same original image and inverted image with different augmentations to model the semantic-level linguistic context and the local character discrimination.
arXiv Detail & Related papers (2024-05-09T15:23:38Z) - Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis [65.7968515029306]
We propose a novel Coarse-to-Fine Latent Diffusion (CFLD) method for Pose-Guided Person Image Synthesis (PGPIS)
A perception-refined decoder is designed to progressively refine a set of learnable queries and extract semantic understanding of person images as a coarse-grained prompt.
arXiv Detail & Related papers (2024-02-28T06:07:07Z) - MaskDiffusion: Boosting Text-to-Image Consistency with Conditional Mask [84.84034179136458]
A crucial factor leading to the text-image mismatch issue is the inadequate cross-modality relation learning.
We propose an adaptive mask, which is conditioned on the attention maps and the prompt embeddings, to dynamically adjust the contribution of each text token to the image features.
Our method, termed MaskDiffusion, is training-free and hot-pluggable for popular pre-trained diffusion models.
arXiv Detail & Related papers (2023-09-08T15:53:37Z) - Energy-Based Cross Attention for Bayesian Context Update in
Text-to-Image Diffusion Models [62.603753097900466]
We present a novel energy-based model (EBM) framework for adaptive context control by modeling the posterior of context vectors.
Specifically, we first formulate EBMs of latent image representations and text embeddings in each cross-attention layer of the denoising autoencoder.
Our latent EBMs further allow zero-shot compositional generation as a linear combination of cross-attention outputs from different contexts.
arXiv Detail & Related papers (2023-06-16T14:30:41Z) - Towards Better Text-Image Consistency in Text-to-Image Generation [15.735515302139335]
We develop a novel CLIP-based metric termed as Semantic Similarity Distance (SSD)
We further design the Parallel Deep Fusion Generative Adversarial Networks (PDF-GAN), which can fuse semantic information at different granularities.
Our PDF-GAN can lead to significantly better text-image consistency while maintaining decent image quality on the CUB and COCO datasets.
arXiv Detail & Related papers (2022-10-27T07:47:47Z) - Memory-Driven Text-to-Image Generation [126.58244124144827]
We introduce a memory-driven semi-parametric approach to text-to-image generation.
Non-parametric component is a memory bank of image features constructed from a training set of images.
parametric component is a generative adversarial network.
arXiv Detail & Related papers (2022-08-15T06:32:57Z) - BOSS: Bottom-up Cross-modal Semantic Composition with Hybrid
Counterfactual Training for Robust Content-based Image Retrieval [61.803481264081036]
Content-Based Image Retrieval (CIR) aims to search for a target image by concurrently comprehending the composition of an example image and a complementary text.
We tackle this task by a novel underlinetextbfBottom-up crunderlinetextbfOss-modal underlinetextbfSemantic compounderlinetextbfSition (textbfBOSS) with Hybrid Counterfactual Training framework.
arXiv Detail & Related papers (2022-07-09T07:14:44Z) - Paired Image-to-Image Translation Quality Assessment Using Multi-Method
Fusion [0.0]
This paper proposes a novel approach that combines signals of image quality between paired source and transformation to predict the latter's similarity with a hypothetical ground truth.
We trained a Multi-Method Fusion (MMF) model via an ensemble of gradient-boosted regressors to predict Deep Image Structure and Texture Similarity (DISTS)
Analysis revealed the task to be feature-constrained, introducing a trade-off at inference between metric time and prediction accuracy.
arXiv Detail & Related papers (2022-05-09T11:05:15Z) - SSAT: A Symmetric Semantic-Aware Transformer Network for Makeup Transfer
and Removal [17.512402192317992]
We propose a unified Symmetric Semantic-Aware Transformer (SSAT) network to realize makeup transfer and removal simultaneously.
A novel SSCFT module and a weakly supervised semantic loss are proposed to model and facilitate the establishment of accurate semantic correspondence.
Experiments show that our method obtains more visually accurate makeup transfer results.
arXiv Detail & Related papers (2021-12-07T11:08:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.