StyO: Stylize Your Face in Only One-Shot
- URL: http://arxiv.org/abs/2303.03231v2
- Date: Tue, 7 Mar 2023 04:01:11 GMT
- Title: StyO: Stylize Your Face in Only One-Shot
- Authors: Bonan Li, Zicheng Zhang, Xuecheng Nie, Congying Han, Yinhan Hu, Tiande
Guo
- Abstract summary: This paper focuses on face stylization with a single artistic target.
Existing works for this task often fail to retain the source content while achieving geometry variation.
We present a novel StyO model, ie. Stylize the face in only One-shot, to solve the above problem.
- Score: 8.253458555695767
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: This paper focuses on face stylization with a single artistic target.
Existing works for this task often fail to retain the source content while
achieving geometry variation. Here, we present a novel StyO model, ie. Stylize
the face in only One-shot, to solve the above problem. In particular, StyO
exploits a disentanglement and recombination strategy. It first disentangles
the content and style of source and target images into identifiers, which are
then recombined in a cross manner to derive the stylized face image. In this
way, StyO decomposes complex images into independent and specific attributes,
and simplifies one-shot face stylization as the combination of different
attributes from input images, thus producing results better matching face
geometry of target image and content of source one. StyO is implemented with
latent diffusion models (LDM) and composed of two key modules: 1) Identifier
Disentanglement Learner (IDL) for disentanglement phase. It represents
identifiers as contrastive text prompts, ie. positive and negative
descriptions. And it introduces a novel triple reconstruction loss to fine-tune
the pre-trained LDM for encoding style and content into corresponding
identifiers; 2) Fine-grained Content Controller (FCC) for the recombination
phase. It recombines disentangled identifiers from IDL to form an augmented
text prompt for generating stylized faces. In addition, FCC also constrains the
cross-attention maps of latent and text features to preserve source face
details in results. The extensive evaluation shows that StyO produces
high-quality images on numerous paintings of various styles and outperforms the
current state-of-the-art. Code will be released upon acceptance.
Related papers
- OSDFace: One-Step Diffusion Model for Face Restoration [72.5045389847792]
Diffusion models have demonstrated impressive performance in face restoration.
We propose OSDFace, a novel one-step diffusion model for face restoration.
Results demonstrate that OSDFace surpasses current state-of-the-art (SOTA) methods in both visual quality and quantitative metrics.
arXiv Detail & Related papers (2024-11-26T07:07:48Z) - Fusion is all you need: Face Fusion for Customized Identity-Preserving Image Synthesis [7.099258248662009]
Text-to-image (T2I) models have significantly advanced the development of artificial intelligence.
However, existing T2I-based methods often struggle to accurately reproduce the appearance of individuals from a reference image.
We leverage the pre-trained UNet from Stable Diffusion to incorporate the target face image directly into the generation process.
arXiv Detail & Related papers (2024-09-27T19:31:04Z) - Arc2Face: A Foundation Model for ID-Consistent Human Faces [95.00331107591859]
Arc2Face is an identity-conditioned face foundation model.
It can generate diverse photo-realistic images with an unparalleled degree of face similarity than existing models.
arXiv Detail & Related papers (2024-03-18T10:32:51Z) - DEADiff: An Efficient Stylization Diffusion Model with Disentangled
Representations [64.43387739794531]
Current encoder-based approaches significantly impair the text controllability of text-to-image models while transferring styles.
We introduce DEADiff to address this issue using the following two strategies.
DEAiff attains the best visual stylization results and optimal balance between the text controllability inherent in the text-to-image model and style similarity to the reference image.
arXiv Detail & Related papers (2024-03-11T17:35:23Z) - Face Swap via Diffusion Model [4.026688121914668]
This report presents a diffusion model based framework for face swapping between two portrait images.
The basic framework consists of three components, for face feature encoding, multi-conditional generation, and face inpainting respectively.
arXiv Detail & Related papers (2024-03-02T07:02:17Z) - High-Fidelity Face Swapping with Style Blending [16.024260677867076]
We propose an innovative end-to-end framework for high-fidelity face swapping.
First, we introduce a StyleGAN-based facial attributes encoder that extracts essential features from faces and inverts them into a latent style code.
Second, we introduce an attention-based style blending module to effectively transfer Face IDs from source to target.
arXiv Detail & Related papers (2023-12-17T23:22:37Z) - Portrait Diffusion: Training-free Face Stylization with
Chain-of-Painting [64.43760427752532]
Face stylization refers to the transformation of a face into a specific portrait style.
Current methods require the use of example-based adaptation approaches to fine-tune pre-trained generative models.
This paper proposes a training-free face stylization framework, named Portrait Diffusion.
arXiv Detail & Related papers (2023-12-03T06:48:35Z) - When StyleGAN Meets Stable Diffusion: a $\mathscr{W}_+$ Adapter for
Personalized Image Generation [60.305112612629465]
Text-to-image diffusion models have excelled in producing diverse, high-quality, and photo-realistic images.
We present a novel use of the extended StyleGAN embedding space $mathcalW_+$ to achieve enhanced identity preservation and disentanglement for diffusion models.
Our method adeptly generates personalized text-to-image outputs that are not only compatible with prompt descriptions but also amenable to common StyleGAN editing directions.
arXiv Detail & Related papers (2023-11-29T09:05:14Z) - T-Person-GAN: Text-to-Person Image Generation with Identity-Consistency
and Manifold Mix-Up [16.165889084870116]
We present an end-to-end approach to generate high-resolution person images conditioned on texts only.
We develop an effective generative model to produce person images with two novel mechanisms.
arXiv Detail & Related papers (2022-08-18T07:41:02Z) - Learned Spatial Representations for Few-shot Talking-Head Synthesis [68.3787368024951]
We propose a novel approach for few-shot talking-head synthesis.
We show that this disentangled representation leads to a significant improvement over previous methods.
arXiv Detail & Related papers (2021-04-29T17:59:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.