Deformable One-shot Face Stylization via DINO Semantic Guidance
- URL: http://arxiv.org/abs/2403.00459v2
- Date: Mon, 4 Mar 2024 10:22:38 GMT
- Title: Deformable One-shot Face Stylization via DINO Semantic Guidance
- Authors: Yang Zhou and Zichong Chen and Hui Huang
- Abstract summary: This paper addresses the issue of one-shot face stylization, focusing on the simultaneous consideration of appearance and structure.
We explore deformation-aware face stylization that diverges from traditional single-image style reference, opting for a real-style image pair instead.
- Score: 12.771707124161665
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper addresses the complex issue of one-shot face stylization, focusing
on the simultaneous consideration of appearance and structure, where previous
methods have fallen short. We explore deformation-aware face stylization that
diverges from traditional single-image style reference, opting for a real-style
image pair instead. The cornerstone of our method is the utilization of a
self-supervised vision transformer, specifically DINO-ViT, to establish a
robust and consistent facial structure representation across both real and
style domains. Our stylization process begins by adapting the StyleGAN
generator to be deformation-aware through the integration of spatial
transformers (STN). We then introduce two innovative constraints for generator
fine-tuning under the guidance of DINO semantics: i) a directional deformation
loss that regulates directional vectors in DINO space, and ii) a relative
structural consistency constraint based on DINO token self-similarities,
ensuring diverse generation. Additionally, style-mixing is employed to align
the color generation with the reference, minimizing inconsistent
correspondences. This framework delivers enhanced deformability for general
one-shot face stylization, achieving notable efficiency with a fine-tuning
duration of approximately 10 minutes. Extensive qualitative and quantitative
comparisons demonstrate our superiority over state-of-the-art one-shot face
stylization methods. Code is available at https://github.com/zichongc/DoesFS
Related papers
- ZePo: Zero-Shot Portrait Stylization with Faster Sampling [61.14140480095604]
This paper presents an inversion-free portrait stylization framework based on diffusion models that accomplishes content and style feature fusion in merely four sampling steps.
We propose a feature merging strategy to amalgamate redundant features in Consistency Features, thereby reducing the computational load of attention control.
arXiv Detail & Related papers (2024-08-10T08:53:41Z) - InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation [5.364489068722223]
The concept of style is inherently underdetermined, encompassing a multitude of elements such as color, material, atmosphere, design, and structure.
Inversion-based methods are prone to style degradation, often resulting in the loss of fine-grained details.
adapter-based approaches frequently require meticulous weight tuning for each reference image to achieve a balance between style intensity and text controllability.
arXiv Detail & Related papers (2024-04-03T13:34:09Z) - Latents2Semantics: Leveraging the Latent Space of Generative Models for
Localized Style Manipulation of Face Images [25.82631308991067]
We introduce the Latents2Semantics Autoencoder (L2SAE), a Generative Autoencoder model that facilitates localized editing of style attributes of several Regions of Interest in face images.
The L2SAE learns separate latent representations for encoded images' structure and style information, allowing for structure-preserving style editing of the chosen ROIs.
We provide qualitative and quantitative results for the same over multiple applications, such as selective style editing and swapping using test images sampled from several datasets.
arXiv Detail & Related papers (2023-12-22T20:06:53Z) - Style Aligned Image Generation via Shared Attention [61.121465570763085]
We introduce StyleAligned, a technique designed to establish style alignment among a series of generated images.
By employing minimal attention sharing' during the diffusion process, our method maintains style consistency across images within T2I models.
Our method's evaluation across diverse styles and text prompts demonstrates high-quality and fidelity.
arXiv Detail & Related papers (2023-12-04T18:55:35Z) - A Unified Arbitrary Style Transfer Framework via Adaptive Contrastive
Learning [84.8813842101747]
Unified Contrastive Arbitrary Style Transfer (UCAST) is a novel style representation learning and transfer framework.
We present an adaptive contrastive learning scheme for style transfer by introducing an input-dependent temperature.
Our framework consists of three key components, i.e., a parallel contrastive learning scheme for style representation and style transfer, a domain enhancement module for effective learning of style distribution, and a generative network for style transfer.
arXiv Detail & Related papers (2023-03-09T04:35:00Z) - StyO: Stylize Your Face in Only One-Shot [8.253458555695767]
This paper focuses on face stylization with a single artistic target.
Existing works for this task often fail to retain the source content while achieving geometry variation.
We present a novel StyO model, ie. Stylize the face in only One-shot, to solve the above problem.
arXiv Detail & Related papers (2023-03-06T15:48:33Z) - StyleSwap: Style-Based Generator Empowers Robust Face Swapping [90.05775519962303]
We introduce a concise and effective framework named StyleSwap.
Our core idea is to leverage a style-based generator to empower high-fidelity and robust face swapping.
We identify that with only minimal modifications, a StyleGAN2 architecture can successfully handle the desired information from both source and target.
arXiv Detail & Related papers (2022-09-27T16:35:16Z) - Learning Graph Neural Networks for Image Style Transfer [131.73237185888215]
State-of-the-art parametric and non-parametric style transfer approaches are prone to either distorted local style patterns due to global statistics alignment, or unpleasing artifacts resulting from patch mismatching.
In this paper, we study a novel semi-parametric neural style transfer framework that alleviates the deficiency of both parametric and non-parametric stylization.
arXiv Detail & Related papers (2022-07-24T07:41:31Z) - Styleverse: Towards Identity Stylization across Heterogeneous Domains [70.13327076710269]
We propose a new challenging task namely IDentity Stylization (IDS) across heterogeneous domains.
We use an effective heterogeneous-network-based framework $Styleverse$ that uses a single domain-aware generator.
$Styleverse achieves higher-fidelity identity stylization compared with other state-of-the-art methods.
arXiv Detail & Related papers (2022-03-02T04:23:01Z) - BlendGAN: Implicitly GAN Blending for Arbitrary Stylized Face Generation [9.370501805054344]
We propose BlendGAN for arbitrary stylized face generation.
We first train a self-supervised style encoder on the generic artistic dataset to extract the representations of arbitrary styles.
In addition, a weighted blending module (WBM) is proposed to blend face and style representations implicitly and control the arbitrary stylization effect.
arXiv Detail & Related papers (2021-10-22T12:00:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.