MagiCapture: High-Resolution Multi-Concept Portrait Customization
- URL: http://arxiv.org/abs/2309.06895v2
- Date: Fri, 2 Feb 2024 16:55:00 GMT
- Title: MagiCapture: High-Resolution Multi-Concept Portrait Customization
- Authors: Junha Hyung, Jaeyo Shin, and Jaegul Choo
- Abstract summary: MagiCapture is a personalization method for integrating subject and style concepts to generate high-resolution portrait images.
We present a novel Attention Refocusing loss coupled with auxiliary priors, both of which facilitate robust learning within this weakly supervised learning setting.
Our pipeline also includes additional post-processing steps to ensure the creation of highly realistic outputs.
- Score: 34.131515004434846
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large-scale text-to-image models including Stable Diffusion are capable of
generating high-fidelity photorealistic portrait images. There is an active
research area dedicated to personalizing these models, aiming to synthesize
specific subjects or styles using provided sets of reference images. However,
despite the plausible results from these personalization methods, they tend to
produce images that often fall short of realism and are not yet on a
commercially viable level. This is particularly noticeable in portrait image
generation, where any unnatural artifact in human faces is easily discernible
due to our inherent human bias. To address this, we introduce MagiCapture, a
personalization method for integrating subject and style concepts to generate
high-resolution portrait images using just a few subject and style references.
For instance, given a handful of random selfies, our fine-tuned model can
generate high-quality portrait images in specific styles, such as passport or
profile photos. The main challenge with this task is the absence of ground
truth for the composed concepts, leading to a reduction in the quality of the
final output and an identity shift of the source subject. To address these
issues, we present a novel Attention Refocusing loss coupled with auxiliary
priors, both of which facilitate robust learning within this weakly supervised
learning setting. Our pipeline also includes additional post-processing steps
to ensure the creation of highly realistic outputs. MagiCapture outperforms
other baselines in both quantitative and qualitative evaluations and can also
be generalized to other non-human objects.
Related papers
- ArtiFade: Learning to Generate High-quality Subject from Blemished Images [10.112125529627157]
ArtiFade exploits fine-tuning of a pre-trained text-to-image model, aiming to remove artifacts.
ArtiFade also ensures the preservation of the original generative capabilities inherent within the diffusion model.
arXiv Detail & Related papers (2024-09-05T17:57:59Z) - Dual-Branch Network for Portrait Image Quality Assessment [76.27716058987251]
We introduce a dual-branch network for portrait image quality assessment (PIQA)
We utilize two backbone networks (textiti.e., Swin Transformer-B) to extract the quality-aware features from the entire portrait image and the facial image cropped from it.
We leverage LIQE, an image scene classification and quality assessment model, to capture the quality-aware and scene-specific features as the auxiliary features.
arXiv Detail & Related papers (2024-05-14T12:43:43Z) - StyleRetoucher: Generalized Portrait Image Retouching with GAN Priors [30.000584682643183]
StyleRetoucher is a novel automatic portrait image retouching framework.
Our method improves an input portrait image's skin condition while preserving its facial details.
We propose a novel blemish-aware feature selection mechanism to effectively identify and remove the skin blemishes.
arXiv Detail & Related papers (2023-12-22T02:32:19Z) - Learning Subject-Aware Cropping by Outpainting Professional Photos [69.0772948657867]
We propose a weakly-supervised approach to learn what makes a high-quality subject-aware crop from professional stock images.
Our insight is to combine a library of stock images with a modern, pre-trained text-to-image diffusion model.
We are able to automatically generate a large dataset of cropped-uncropped training pairs to train a cropping model.
arXiv Detail & Related papers (2023-12-19T11:57:54Z) - PortraitBooth: A Versatile Portrait Model for Fast Identity-preserved
Personalization [92.90392834835751]
PortraitBooth is designed for high efficiency, robust identity preservation, and expression-editable text-to-image generation.
PortraitBooth eliminates computational overhead and mitigates identity distortion.
It incorporates emotion-aware cross-attention control for diverse facial expressions in generated images.
arXiv Detail & Related papers (2023-12-11T13:03:29Z) - FaceStudio: Put Your Face Everywhere in Seconds [23.381791316305332]
Identity-preserving image synthesis seeks to maintain a subject's identity while adding a personalized, stylistic touch.
Traditional methods, such as Textual Inversion and DreamBooth, have made strides in custom image creation.
Our research introduces a novel approach to identity-preserving synthesis, with a particular focus on human images.
arXiv Detail & Related papers (2023-12-05T11:02:45Z) - WebtoonMe: A Data-Centric Approach for Full-Body Portrait Stylization [5.2661965280415926]
We propose a data-centric solution to build a production-level full-body portrait stylization system.
Based on the two-stage scheme, we construct a novel and advanced dataset preparation paradigm.
Experiments reveal that with our pipeline, high-quality portrait stylization can be achieved without additional losses or architectural changes.
arXiv Detail & Related papers (2022-10-19T07:09:03Z) - CtlGAN: Few-shot Artistic Portraits Generation with Contrastive Transfer
Learning [77.27821665339492]
CtlGAN is a new few-shot artistic portraits generation model with a novel contrastive transfer learning strategy.
We adapt a pretrained StyleGAN in the source domain to a target artistic domain with no more than 10 artistic faces.
We propose a new encoder which embeds real faces into Z+ space and proposes a dual-path training strategy to better cope with the adapted decoder.
arXiv Detail & Related papers (2022-03-16T13:28:17Z) - Bridging Composite and Real: Towards End-to-end Deep Image Matting [88.79857806542006]
We study the roles of semantics and details for image matting.
We propose a novel Glance and Focus Matting network (GFM), which employs a shared encoder and two separate decoders.
Comprehensive empirical studies have demonstrated that GFM outperforms state-of-the-art methods.
arXiv Detail & Related papers (2020-10-30T10:57:13Z) - Generating Person Images with Appearance-aware Pose Stylizer [66.44220388377596]
We present a novel end-to-end framework to generate realistic person images based on given person poses and appearances.
The core of our framework is a novel generator called Appearance-aware Pose Stylizer (APS) which generates human images by coupling the target pose with the conditioned person appearance progressively.
arXiv Detail & Related papers (2020-07-17T15:58:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.