Related papers: EditID: Training-Free Editable ID Customization for Text-to-Image Generation

EditID: Training-Free Editable ID Customization for Text-to-Image Generation

URL: http://arxiv.org/abs/2503.12526v1
Date: Sun, 16 Mar 2025 14:41:30 GMT
Title: EditID: Training-Free Editable ID Customization for Text-to-Image Generation
Authors: Guandong Li, Zhaobin Chu,
Abstract summary: We propose EditID, a training-free approach based on the DiT architecture, which achieves highly editable customized IDs for text to image generation.<n>It is challenging to alter facial orientation, character attributes, and other features through prompts.<n> EditID is the first text-to-image solution to propose customizable ID editability on the DiT architecture.
Score: 12.168520751389622
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose EditID, a training-free approach based on the DiT architecture, which achieves highly editable customized IDs for text to image generation. Existing text-to-image models for customized IDs typically focus more on ID consistency while neglecting editability. It is challenging to alter facial orientation, character attributes, and other features through prompts. EditID addresses this by deconstructing the text-to-image model for customized IDs into an image generation branch and a character feature branch. The character feature branch is further decoupled into three modules: feature extraction, feature fusion, and feature integration. By introducing a combination of mapping features and shift features, along with controlling the intensity of ID feature integration, EditID achieves semantic compression of local features across network depths, forming an editable feature space. This enables the successful generation of high-quality images with editable IDs while maintaining ID consistency, achieving excellent results in the IBench evaluation, which is an editability evaluation framework for the field of customized ID text-to-image generation that quantitatively demonstrates the superior performance of EditID. EditID is the first text-to-image solution to propose customizable ID editability on the DiT architecture, meeting the demands of long prompts and high quality image generation.

Related papers

ID-EA: Identity-driven Text Enhancement and Adaptation with Textual Inversion for Personalized Text-to-Image Generation [33.84646269805187]
ID-EA is a novel framework that guides text embeddings to align with visual identity embeddings.<n> ID-EA substantially outperforms state-of-the-art methods in identity preservation metrics.<n>It generates personalized portraits 15 times faster than existing approaches.
arXiv Detail & Related papers (2025-07-16T07:42:02Z)
Identity-Preserving Text-to-Image Generation via Dual-Level Feature Decoupling and Expert-Guided Fusion [35.67333978414322]
We propose a novel framework that improves the separation of identity-related and identity-unrelated features.<n>Our framework consists of two key components: an Implicit-Explicit foreground-background Decoupling Module and a Feature Fusion Module.
arXiv Detail & Related papers (2025-05-28T13:40:46Z)
DynamicID: Zero-Shot Multi-ID Image Personalization with Flexible Facial Editability [12.692129257068085]
We present DynamicID, a tuning-free framework that inherently facilitates both single-ID and multi-ID personalized generation.<n>Our key innovations include: 1) Semantic-Activated Attention (SAA), which employs query-level activation gating to minimize disruption to the base model when injecting ID features and achieve multi-ID personalization without requiring multi-ID samples during training; 2) Identity-Motion Reconfigurator (IMR), which applies feature-space manipulation to effectively disentangle facial motion and identity features, supporting flexible facial editing; and 3) a task-decoupled training paradigm that reduces data dependency, together with VariFace-10k,
arXiv Detail & Related papers (2025-03-09T08:16:19Z)
DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation [22.599542105037443]
DisEnvisioner is a novel approach for effectively extracting and enriching the subject-essential features while filtering out -irrelevant information. Specifically, the feature of the subject and other irrelevant components are effectively separated into distinctive visual tokens, enabling a much more accurate customization. Experiments demonstrate the superiority of our approach over existing methods in instruction response (editability), ID consistency, inference speed, and the overall image quality.
arXiv Detail & Related papers (2024-10-02T22:29:14Z)
FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing [22.308638156328968]
DDIM latent, crucial for retaining the original image's key features and layout, significantly contribute to limitations. We introduce FlexiEdit, which enhances fidelity to input text prompts by refining DDIM latent. Our approach represents notable progress in image editing, particularly in performing complex non-rigid edits.
arXiv Detail & Related papers (2024-07-25T08:07:40Z)
CustAny: Customizing Anything from A Single Example [73.90939022698399]
We present a novel pipeline to construct a large dataset of general objects, featuring 315k text-image samples across 10k categories. With the help of MC-IDC, we introduce Customizing Anything (CustAny), a zero-shot framework that maintains ID fidelity and supports flexible text editing for general objects. Our contributions include a large-scale dataset, the CustAny framework and novel ID processing to advance this field.
arXiv Detail & Related papers (2024-06-17T15:26:22Z)
MasterWeaver: Taming Editability and Face Identity for Personalized Text-to-Image Generation [59.13765130528232]
We present MasterWeaver, a test-time tuning-free method designed to generate personalized images with both faithful identity fidelity and flexible editability. Specifically, MasterWeaver adopts an encoder to extract identity features and steers the image generation through additional introduced cross attention. To improve editability while maintaining identity fidelity, we propose an editing direction loss for training, which aligns the editing directions of our MasterWeaver with those of the original T2I model.
arXiv Detail & Related papers (2024-05-09T14:42:16Z)
ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning [57.91881829308395]
Identity-preserving text-to-image generation (ID-T2I) has received significant attention due to its wide range of application scenarios like AI portrait and advertising. We present textbfID-Aligner, a general feedback learning framework to enhance ID-T2I performance.
arXiv Detail & Related papers (2024-04-23T18:41:56Z)
IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models [31.762112403595612]
IDAdapter is a tuning-free approach that enhances the diversity and identity preservation in personalized image generation from a single face image. During the training phase, we incorporate mixed features from multiple reference images of a specific identity to enrich identity-related content details.
arXiv Detail & Related papers (2024-03-20T12:13:04Z)
Infinite-ID: Identity-preserved Personalization via ID-semantics Decoupling Paradigm [31.06269858216316]
We propose Infinite-ID, an ID-semantics decoupling paradigm for identity-preserved personalization. We introduce an identity-enhanced training, incorporating an additional image cross-attention module to capture sufficient ID information. We also introduce a feature interaction mechanism that combines a mixed attention module with an AdaIN-mean operation to seamlessly merge the two streams.
arXiv Detail & Related papers (2024-03-18T13:39:53Z)
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding [102.07914175196817]
PhotoMaker is an efficient personalized text-to-image generation method. It encodes an arbitrary number of input ID images into a stack ID embedding for preserving ID information.
arXiv Detail & Related papers (2023-12-07T17:32:29Z)
iEdit: Localised Text-guided Image Editing with Weak Supervision [53.082196061014734]
We propose a novel learning method for text-guided image editing. It generates images conditioned on a source image and a textual edit prompt. It shows favourable results against its counterparts in terms of image fidelity, CLIP alignment score and qualitatively for editing both generated and real images.
arXiv Detail & Related papers (2023-05-10T07:39:14Z)
Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting [53.708523312636096]
We present Imagen Editor, a cascaded diffusion model built, by fine-tuning on text-guided image inpainting. edits are faithful to the text prompts, which is accomplished by using object detectors to propose inpainting masks during training. To improve qualitative and quantitative evaluation, we introduce EditBench, a systematic benchmark for text-guided image inpainting.
arXiv Detail & Related papers (2022-12-13T21:25:11Z)
ManiCLIP: Multi-Attribute Face Manipulation from Text [104.30600573306991]
We present a novel multi-attribute face manipulation method based on textual descriptions. Our method generates natural manipulated faces with minimal text-irrelevant attribute editing.
arXiv Detail & Related papers (2022-10-02T07:22:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.