Omni-ID: Holistic Identity Representation Designed for Generative Tasks
- URL: http://arxiv.org/abs/2412.09694v1
- Date: Thu, 12 Dec 2024 19:21:20 GMT
- Title: Omni-ID: Holistic Identity Representation Designed for Generative Tasks
- Authors: Guocheng Qian, Kuan-Chieh Wang, Or Patashnik, Negin Heravi, Daniil Ostashev, Sergey Tulyakov, Daniel Cohen-Or, Kfir Aberman,
- Abstract summary: Omni-ID encodes holistic information about an individual's appearance across diverse expressions.
It consolidates information from a varied number of unstructured input images into a structured representation.
It demonstrates substantial improvements over conventional representations across various generative tasks.
- Score: 75.29174595706533
- License:
- Abstract: We introduce Omni-ID, a novel facial representation designed specifically for generative tasks. Omni-ID encodes holistic information about an individual's appearance across diverse expressions and poses within a fixed-size representation. It consolidates information from a varied number of unstructured input images into a structured representation, where each entry represents certain global or local identity features. Our approach uses a few-to-many identity reconstruction training paradigm, where a limited set of input images is used to reconstruct multiple target images of the same individual in various poses and expressions. A multi-decoder framework is further employed to leverage the complementary strengths of diverse decoders during training. Unlike conventional representations, such as CLIP and ArcFace, which are typically learned through discriminative or contrastive objectives, Omni-ID is optimized with a generative objective, resulting in a more comprehensive and nuanced identity capture for generative tasks. Trained on our MFHQ dataset -- a multi-view facial image collection, Omni-ID demonstrates substantial improvements over conventional representations across various generative tasks.
Related papers
- Object-level Visual Prompts for Compositional Image Generation [75.6085388740087]
We introduce a method for composing object-level visual prompts within a text-to-image diffusion model.
A key challenge in this task is to preserve the identity of the objects depicted in the input visual prompts.
We introduce a new KV-mixed cross-attention mechanism, in which keys and values are learned from distinct visual representations.
arXiv Detail & Related papers (2025-01-02T18:59:44Z) - Dynamic Token Selection for Aerial-Ground Person Re-Identification [0.36832029288386137]
We propose a novel Dynamic Token Selective Transformer (DTST) tailored for AGPReID.
We segment the input image into multiple tokens, with each token representing a unique region or feature within the image.
Using a Top-k strategy, we extract the k most significant tokens that contain vital information essential for identity recognition.
arXiv Detail & Related papers (2024-11-30T11:07:11Z) - Synthesizing Efficient Data with Diffusion Models for Person Re-Identification Pre-Training [51.87027943520492]
We present a novel paradigm Diffusion-ReID to efficiently augment and generate diverse images based on known identities.
Benefiting from our proposed paradigm, we first create a new large-scale person Re-ID dataset Diff-Person, which consists of over 777K images from 5,183 identities.
arXiv Detail & Related papers (2024-06-10T06:26:03Z) - InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation [0.0]
"InstantFamily" is an approach that employs a novel cross-attention mechanism and a multimodal embedding stack to achieve zero-shot multi-ID image generation.
Our method effectively preserves ID as it utilizes global and local features from a pre-trained face recognition model integrated with text conditions.
arXiv Detail & Related papers (2024-04-30T10:16:21Z) - ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving [64.90148669690228]
ConsistentID is an innovative method crafted for diverseidentity-preserving portrait generation under fine-grained multimodal facial prompts.
We present a fine-grained portrait dataset, FGID, with over 500,000 facial images, offering greater diversity and comprehensiveness than existing public facial datasets.
arXiv Detail & Related papers (2024-04-25T17:23:43Z) - IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models [31.762112403595612]
IDAdapter is a tuning-free approach that enhances the diversity and identity preservation in personalized image generation from a single face image.
During the training phase, we incorporate mixed features from multiple reference images of a specific identity to enrich identity-related content details.
arXiv Detail & Related papers (2024-03-20T12:13:04Z) - Identity Encoder for Personalized Diffusion [57.1198884486401]
We propose an encoder-based approach for personalization.
We learn an identity encoder which can extract an identity representation from a set of reference images of a subject.
We show that our approach consistently outperforms existing fine-tuning based approach in both image generation and reconstruction.
arXiv Detail & Related papers (2023-04-14T23:32:24Z) - Semantic Consistency and Identity Mapping Multi-Component Generative
Adversarial Network for Person Re-Identification [39.605062525247135]
We propose a semantic consistency and identity mapping multi-component generative adversarial network (SC-IMGAN) which provides style adaptation from one to many domains.
Our proposed method outperforms state-of-the-art techniques on six challenging person Re-ID datasets.
arXiv Detail & Related papers (2021-04-28T14:12:29Z) - Generating Annotated High-Fidelity Images Containing Multiple Coherent
Objects [10.783993190686132]
We propose a multi-object generation framework that can synthesize images with multiple objects without explicitly requiring contextual information.
We demonstrate how coherency and fidelity are preserved with our method through experiments on the Multi-MNIST and CLEVR datasets.
arXiv Detail & Related papers (2020-06-22T11:33:55Z) - Uncertainty-Aware Multi-Shot Knowledge Distillation for Image-Based
Object Re-Identification [93.39253443415392]
We propose exploiting the multi-shots of the same identity to guide the feature learning of each individual image.
It consists of a teacher network (T-net) that learns the comprehensive features from multiple images of the same object, and a student network (S-net) that takes a single image as input.
We validate the effectiveness of our approach on the popular vehicle re-id and person re-id datasets.
arXiv Detail & Related papers (2020-01-15T09:39:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.