FISTNet: FusIon of STyle-path generative Networks for Facial Style Transfer
- URL: http://arxiv.org/abs/2307.09020v3
- Date: Tue, 2 Apr 2024 15:46:19 GMT
- Title: FISTNet: FusIon of STyle-path generative Networks for Facial Style Transfer
- Authors: Sunder Ali Khowaja, Lewis Nkenyereye, Ghulam Mujtaba, Ik Hyun Lee, Giancarlo Fortino, Kapal Dev,
- Abstract summary: StyleGAN methods have the tendency of overfitting that results in the introduction of artifacts in the facial images.
We propose a FusIon of STyles (FIST) network for facial images that leverages pre-trained multipath style transfer networks.
- Score: 15.308837341075135
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: With the surge in emerging technologies such as Metaverse, spatial computing, and generative AI, the application of facial style transfer has gained a lot of interest from researchers as well as startups enthusiasts alike. StyleGAN methods have paved the way for transfer-learning strategies that could reduce the dependency on the huge volume of data that is available for the training process. However, StyleGAN methods have the tendency of overfitting that results in the introduction of artifacts in the facial images. Studies, such as DualStyleGAN, proposed the use of multipath networks but they require the networks to be trained for a specific style rather than generating a fusion of facial styles at once. In this paper, we propose a FusIon of STyles (FIST) network for facial images that leverages pre-trained multipath style transfer networks to eliminate the problem associated with lack of huge data volume in the training phase along with the fusion of multiple styles at the output. We leverage pre-trained styleGAN networks with an external style pass that use residual modulation block instead of a transform coding block. The method also preserves facial structure, identity, and details via the gated mapping unit introduced in this study. The aforementioned components enable us to train the network with very limited amount of data while generating high-quality stylized images. Our training process adapts curriculum learning strategy to perform efficient, flexible style and model fusion in the generative space. We perform extensive experiments to show the superiority of FISTNet in comparison to existing state-of-the-art methods.
Related papers
- Harnessing the Latent Diffusion Model for Training-Free Image Style Transfer [24.46409405016844]
Style transfer task is one of those challenges that transfers the visual attributes of a style image to another content image.
We propose a training-free style transfer algorithm, Style Tracking Reverse Diffusion Process (STRDP) for a pretrained Latent Diffusion Model (LDM)
Our algorithm employs Adaptive Instance Normalization (AdaIN) function in a distinct manner during the reverse diffusion process of an LDM.
arXiv Detail & Related papers (2024-10-02T09:28:21Z) - High-Fidelity Face Swapping with Style Blending [16.024260677867076]
We propose an innovative end-to-end framework for high-fidelity face swapping.
First, we introduce a StyleGAN-based facial attributes encoder that extracts essential features from faces and inverts them into a latent style code.
Second, we introduce an attention-based style blending module to effectively transfer Face IDs from source to target.
arXiv Detail & Related papers (2023-12-17T23:22:37Z) - A Unified Arbitrary Style Transfer Framework via Adaptive Contrastive
Learning [84.8813842101747]
Unified Contrastive Arbitrary Style Transfer (UCAST) is a novel style representation learning and transfer framework.
We present an adaptive contrastive learning scheme for style transfer by introducing an input-dependent temperature.
Our framework consists of three key components, i.e., a parallel contrastive learning scheme for style representation and style transfer, a domain enhancement module for effective learning of style distribution, and a generative network for style transfer.
arXiv Detail & Related papers (2023-03-09T04:35:00Z) - StyleAdv: Meta Style Adversarial Training for Cross-Domain Few-Shot
Learning [89.86971464234533]
Cross-Domain Few-Shot Learning (CD-FSL) is a recently emerging task that tackles few-shot learning across different domains.
We propose a novel model-agnostic meta Style Adversarial training (StyleAdv) method together with a novel style adversarial attack method.
Our method is gradually robust to the visual styles, thus boosting the generalization ability for novel target datasets.
arXiv Detail & Related papers (2023-02-18T11:54:37Z) - Domain Enhanced Arbitrary Image Style Transfer via Contrastive Learning [84.8813842101747]
Contrastive Arbitrary Style Transfer (CAST) is a new style representation learning and style transfer method via contrastive learning.
Our framework consists of three key components, i.e., a multi-layer style projector for style code encoding, a domain enhancement module for effective learning of style distribution, and a generative network for image style transfer.
arXiv Detail & Related papers (2022-05-19T13:11:24Z) - Deep Translation Prior: Test-time Training for Photorealistic Style
Transfer [36.82737412912885]
Recent techniques to solve photorealistic style transfer within deep convolutional neural networks (CNNs) generally require intensive training from large-scale datasets.
We propose a novel framework, dubbed Deep Translation Prior (DTP), to accomplish photorealistic style transfer through test-time training on given input image pair with untrained networks.
arXiv Detail & Related papers (2021-12-12T04:54:27Z) - DAFormer: Improving Network Architectures and Training Strategies for
Domain-Adaptive Semantic Segmentation [99.88539409432916]
We study the unsupervised domain adaptation (UDA) process.
We propose a novel UDA method, DAFormer, based on the benchmark results.
DAFormer significantly improves the state-of-the-art performance by 10.8 mIoU for GTA->Cityscapes and 5.4 mIoU for Synthia->Cityscapes.
arXiv Detail & Related papers (2021-11-29T19:00:46Z) - Multi-Agent Semi-Siamese Training for Long-tail and Shallow Face
Learning [54.13876727413492]
In many real-world scenarios of face recognition, the depth of training dataset is shallow, which means only two face images are available for each ID.
With the non-uniform increase of samples, such issue is converted to a more general case, a.k.a a long-tail face learning.
Based on the Semi-Siamese Training (SST), we introduce an advanced solution, named Multi-Agent Semi-Siamese Training (MASST)
MASST includes a probe network and multiple gallery agents, the former aims to encode the probe features, and the latter constitutes a stack of
arXiv Detail & Related papers (2021-05-10T04:57:32Z) - StEP: Style-based Encoder Pre-training for Multi-modal Image Synthesis [68.3787368024951]
We propose a novel approach for multi-modal Image-to-image (I2I) translation.
We learn a latent embedding, jointly with the generator, that models the variability of the output domain.
Specifically, we pre-train a generic style encoder using a novel proxy task to learn an embedding of images, from arbitrary domains, into a low-dimensional style latent space.
arXiv Detail & Related papers (2021-04-14T19:58:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.