RealisID: Scale-Robust and Fine-Controllable Identity Customization via Local and Global Complementation
- URL: http://arxiv.org/abs/2412.16832v1
- Date: Sun, 22 Dec 2024 02:36:25 GMT
- Title: RealisID: Scale-Robust and Fine-Controllable Identity Customization via Local and Global Complementation
- Authors: Zhaoyang Sun, Fei Du, Weihua Chen, Fan Wang, Yaxiong Chen, Yi Rong, Shengwu Xiong,
- Abstract summary: RealisID learns different control capabilities through the cooperation between a pair of local and global branches.
Our method can be easily extended to handle multi-person customization, even only trained on single-person datasets.
- Score: 29.430749386234414
- License:
- Abstract: Recently, the success of text-to-image synthesis has greatly advanced the development of identity customization techniques, whose main goal is to produce realistic identity-specific photographs based on text prompts and reference face images. However, it is difficult for existing identity customization methods to simultaneously meet the various requirements of different real-world applications, including the identity fidelity of small face, the control of face location, pose and expression, as well as the customization of multiple persons. To this end, we propose a scale-robust and fine-controllable method, namely RealisID, which learns different control capabilities through the cooperation between a pair of local and global branches. Specifically, by using cropping and up-sampling operations to filter out face-irrelevant information, the local branch concentrates the fine control of facial details and the scale-robust identity fidelity within the face region. Meanwhile, the global branch manages the overall harmony of the entire image. It also controls the face location by taking the location guidance as input. As a result, RealisID can benefit from the complementarity of these two branches. Finally, by implementing our branches with two different variants of ControlNet, our method can be easily extended to handle multi-person customization, even only trained on single-person datasets. Extensive experiments and ablation studies indicate the effectiveness of RealisID and verify its ability in fulfilling all the requirements mentioned above.
Related papers
- Omni-ID: Holistic Identity Representation Designed for Generative Tasks [75.29174595706533]
Omni-ID encodes holistic information about an individual's appearance across diverse expressions.
It consolidates information from a varied number of unstructured input images into a structured representation.
It demonstrates substantial improvements over conventional representations across various generative tasks.
arXiv Detail & Related papers (2024-12-12T19:21:20Z) - ID$^3$: Identity-Preserving-yet-Diversified Diffusion Models for Synthetic Face Recognition [60.15830516741776]
Synthetic face recognition (SFR) aims to generate datasets that mimic the distribution of real face data.
We introduce a diffusion-fueled SFR model termed $textID3$.
$textID3$ employs an ID-preserving loss to generate diverse yet identity-consistent facial appearances.
arXiv Detail & Related papers (2024-09-26T06:46:40Z) - Towards Global Localization using Multi-Modal Object-Instance Re-Identification [23.764646800085977]
We propose a novel re-identification transformer architecture that integrates multimodal RGB and depth information.
We demonstrate improvements in ReID across scenes that are cluttered or have varying illumination conditions.
We also develop a ReID-based localization framework that enables accurate camera localization and pose identification across different viewpoints.
arXiv Detail & Related papers (2024-09-18T14:15:10Z) - ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving [64.90148669690228]
ConsistentID is an innovative method crafted for diverseidentity-preserving portrait generation under fine-grained multimodal facial prompts.
We present a fine-grained portrait dataset, FGID, with over 500,000 facial images, offering greater diversity and comprehensiveness than existing public facial datasets.
arXiv Detail & Related papers (2024-04-25T17:23:43Z) - ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning [57.91881829308395]
Identity-preserving text-to-image generation (ID-T2I) has received significant attention due to its wide range of application scenarios like AI portrait and advertising.
We present textbfID-Aligner, a general feedback learning framework to enhance ID-T2I performance.
arXiv Detail & Related papers (2024-04-23T18:41:56Z) - An Open-World, Diverse, Cross-Spatial-Temporal Benchmark for Dynamic Wild Person Re-Identification [58.5877965612088]
Person re-identification (ReID) has made great strides thanks to the data-driven deep learning techniques.
The existing benchmark datasets lack diversity, and models trained on these data cannot generalize well to dynamic wild scenarios.
We develop a new Open-World, Diverse, Cross-Spatial-Temporal dataset named OWD with several distinct features.
arXiv Detail & Related papers (2024-03-22T11:21:51Z) - Identity-preserving Editing of Multiple Facial Attributes by Learning
Global Edit Directions and Local Adjustments [4.082799056366928]
ID-Style is a new architecture capable of addressing the problem of identity loss during attribute manipulation.
We introduce two losses during training to enforce the LGD to find semi-sparse semantic directions, which along with the IAIP, preserve the identity of the input instance.
arXiv Detail & Related papers (2023-09-25T16:28:39Z) - Facial Action Units Detection Aided by Global-Local Expression Embedding [36.78982474775454]
We develop a novel AU detection framework aided by the Global-Local facial Expressions Embedding, dubbed GLEE-Net.
Our GLEE-Net consists of three branches to extract identity-independent expression features for AU detection.
Our method significantly outperforms the state-of-the-art on the widely-used DISFA, BP4D and BP4D+ datasets.
arXiv Detail & Related papers (2022-10-25T02:35:32Z) - Semantic Consistency and Identity Mapping Multi-Component Generative
Adversarial Network for Person Re-Identification [39.605062525247135]
We propose a semantic consistency and identity mapping multi-component generative adversarial network (SC-IMGAN) which provides style adaptation from one to many domains.
Our proposed method outperforms state-of-the-art techniques on six challenging person Re-ID datasets.
arXiv Detail & Related papers (2021-04-28T14:12:29Z) - FaceController: Controllable Attribute Editing for Face in the Wild [74.56117807309576]
We propose a simple feed-forward network to generate high-fidelity manipulated faces.
By simply employing some existing and easy-obtainable prior information, our method can control, transfer, and edit diverse attributes of faces in the wild.
In our method, we decouple identity, expression, pose, and illumination using 3D priors; separate texture and colors by using region-wise style codes.
arXiv Detail & Related papers (2021-02-23T02:47:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.