Semi-parametric Makeup Transfer via Semantic-aware Correspondence
        - URL: http://arxiv.org/abs/2203.02286v1
- Date: Fri, 4 Mar 2022 12:54:19 GMT
- Title: Semi-parametric Makeup Transfer via Semantic-aware Correspondence
- Authors: Mingrui Zhu, Yun Yi, Nannan Wang, Xiaoyu Wang, Xinbo Gao
- Abstract summary: Large discrepancy between source non-makeup image and reference makeup image is one of key challenges in makeup transfer.
Non-parametric techniques have a high potential for addressing the pose, expression, and occlusion discrepancies.
We propose a textbfSemi-textbfparametric textbfMakeup textbfTransfer (SpMT) method, which combines the reciprocal strengths of non-parametric and parametric mechanisms.
- Score: 99.02329132102098
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   The large discrepancy between the source non-makeup image and the reference
makeup image is one of the key challenges in makeup transfer. Conventional
approaches for makeup transfer either learn disentangled representation or
perform pixel-wise correspondence in a parametric way between two images. We
argue that non-parametric techniques have a high potential for addressing the
pose, expression, and occlusion discrepancies. To this end, this paper proposes
a \textbf{S}emi-\textbf{p}arametric \textbf{M}akeup \textbf{T}ransfer (SpMT)
method, which combines the reciprocal strengths of non-parametric and
parametric mechanisms. The non-parametric component is a novel
\textbf{S}emantic-\textbf{a}ware \textbf{C}orrespondence (SaC) module that
explicitly reconstructs content representation with makeup representation under
the strong constraint of component semantics. The reconstructed representation
is desired to preserve the spatial and identity information of the source image
while "wearing" the makeup of the reference image. The output image is
synthesized via a parametric decoder that draws on the reconstructed
representation. Extensive experiments demonstrate the superiority of our method
in terms of visual quality, robustness, and flexibility. Code and pre-trained
model are available at \url{https://github.com/AnonymScholar/SpMT.
 
      
        Related papers
        - Image-to-Image Translation with Diffusion Transformers and CLIP-Based   Image Conditioning [2.9603070411207644]
 Diffusion Transformers (DiT) is a diffusion-based framework for image-to-image translation.<n>DiT combines the denoising capabilities of diffusion models with the global modeling power of transformers.<n>We validate our approach on two benchmark datasets: face2comics, which translates real human faces to comic-style illustrations, and edges2shoes, which translates edge maps to realistic shoe images.
 arXiv  Detail & Related papers  (2025-05-21T20:37:33Z)
- ShapeShift: Towards Text-to-Shape Arrangement Synthesis with   Content-Aware Geometric Constraints [13.2441524021269]
 ShapeShift is a text-guided image-to-image translation task that requires rearranging the input set of rigid shapes into non-overlapping configurations.
We introduce a content-aware collision resolution mechanism that applies minimal semantically coherent adjustments when overlaps occur.
Our approach yields interpretable compositions where spatial relationships clearly embody the textual prompt.
 arXiv  Detail & Related papers  (2025-03-18T20:48:58Z)
- SQ-GAN: Semantic Image Communications Using Masked Vector Quantization [55.02795214161371]
 This work introduces Semantically Masked VQ-GAN (SQ-GAN), a novel approach to optimize image compression for semantic/task-oriented communications.<n>SQ-GAN employs off-the-shelf semantic semantic segmentation and a new semantic-conditioned adaptive mask module (SAMM) to selectively encode semantically significant features of the images.
 arXiv  Detail & Related papers  (2025-02-13T17:35:57Z)
- Unleashing Text-to-Image Diffusion Prior for Zero-Shot Image Captioning [70.98890307376548]
 We propose a novel Patch-wise Cross-modal feature Mix-up (PCM) mechanism to adaptively mitigate the unfaithful contents during training.
Our PCM-Net ranks first in both in-domain and cross-domain zero-shot image captioning.
 arXiv  Detail & Related papers  (2024-12-31T13:39:08Z)
- Self-Supervised Pre-training with Symmetric Superimposition Modeling for   Scene Text Recognition [43.61569815081384]
 We propose Symmetric Superimposition Modeling to simultaneously capture local character features and linguistic information in text images.
At the pixel level, we reconstruct the original and inverted images to capture character shapes and texture-level linguistic context.
At the feature level, we reconstruct the feature of the same original image and inverted image with different augmentations to model the semantic-level linguistic context and the local character discrimination.
 arXiv  Detail & Related papers  (2024-05-09T15:23:38Z)
- Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis [65.7968515029306]
 We propose a novel Coarse-to-Fine Latent Diffusion (CFLD) method for Pose-Guided Person Image Synthesis (PGPIS)
A perception-refined decoder is designed to progressively refine a set of learnable queries and extract semantic understanding of person images as a coarse-grained prompt.
 arXiv  Detail & Related papers  (2024-02-28T06:07:07Z)
- MaskDiffusion: Boosting Text-to-Image Consistency with Conditional Mask [84.84034179136458]
 A crucial factor leading to the text-image mismatch issue is the inadequate cross-modality relation learning.
We propose an adaptive mask, which is conditioned on the attention maps and the prompt embeddings, to dynamically adjust the contribution of each text token to the image features.
Our method, termed MaskDiffusion, is training-free and hot-pluggable for popular pre-trained diffusion models.
 arXiv  Detail & Related papers  (2023-09-08T15:53:37Z)
- Energy-Based Cross Attention for Bayesian Context Update in
  Text-to-Image Diffusion Models [62.603753097900466]
 We present a novel energy-based model (EBM) framework for adaptive context control by modeling the posterior of context vectors.
Specifically, we first formulate EBMs of latent image representations and text embeddings in each cross-attention layer of the denoising autoencoder.
Our latent EBMs further allow zero-shot compositional generation as a linear combination of cross-attention outputs from different contexts.
 arXiv  Detail & Related papers  (2023-06-16T14:30:41Z)
- Towards Better Text-Image Consistency in Text-to-Image Generation [15.735515302139335]
 We develop a novel CLIP-based metric termed as Semantic Similarity Distance (SSD)
We further design the Parallel Deep Fusion Generative Adversarial Networks (PDF-GAN), which can fuse semantic information at different granularities.
Our PDF-GAN can lead to significantly better text-image consistency while maintaining decent image quality on the CUB and COCO datasets.
 arXiv  Detail & Related papers  (2022-10-27T07:47:47Z)
- Memory-Driven Text-to-Image Generation [126.58244124144827]
 We introduce a memory-driven semi-parametric approach to text-to-image generation.
Non-parametric component is a memory bank of image features constructed from a training set of images.
 parametric component is a generative adversarial network.
 arXiv  Detail & Related papers  (2022-08-15T06:32:57Z)
- BOSS: Bottom-up Cross-modal Semantic Composition with Hybrid
  Counterfactual Training for Robust Content-based Image Retrieval [61.803481264081036]
 Content-Based Image Retrieval (CIR) aims to search for a target image by concurrently comprehending the composition of an example image and a complementary text.
We tackle this task by a novel underlinetextbfBottom-up crunderlinetextbfOss-modal underlinetextbfSemantic compounderlinetextbfSition (textbfBOSS) with Hybrid Counterfactual Training framework.
 arXiv  Detail & Related papers  (2022-07-09T07:14:44Z)
- Paired Image-to-Image Translation Quality Assessment Using Multi-Method
  Fusion [0.0]
 This paper proposes a novel approach that combines signals of image quality between paired source and transformation to predict the latter's similarity with a hypothetical ground truth.
We trained a Multi-Method Fusion (MMF) model via an ensemble of gradient-boosted regressors to predict Deep Image Structure and Texture Similarity (DISTS)
Analysis revealed the task to be feature-constrained, introducing a trade-off at inference between metric time and prediction accuracy.
 arXiv  Detail & Related papers  (2022-05-09T11:05:15Z)
- SSAT: A Symmetric Semantic-Aware Transformer Network for Makeup Transfer
  and Removal [17.512402192317992]
 We propose a unified Symmetric Semantic-Aware Transformer (SSAT) network to realize makeup transfer and removal simultaneously.
A novel SSCFT module and a weakly supervised semantic loss are proposed to model and facilitate the establishment of accurate semantic correspondence.
 Experiments show that our method obtains more visually accurate makeup transfer results.
 arXiv  Detail & Related papers  (2021-12-07T11:08:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.