Related papers: EditIDv2: Editable ID Customization with Data-Lubricated ID Feature Integration for Text-to-Image Generation

EditIDv2: Editable ID Customization with Data-Lubricated ID Feature Integration for Text-to-Image Generation

URL: http://arxiv.org/abs/2509.05659v1
Date: Sat, 06 Sep 2025 09:29:48 GMT
Title: EditIDv2: Editable ID Customization with Data-Lubricated ID Feature Integration for Text-to-Image Generation
Authors: Guandong Li, Zhaobin Chu,
Abstract summary: EditIDv2 is a tuning-free solution specifically designed for high-complexity narrative scenes and long text inputs.<n>We achieve deep, multi-level semantic editing while maintaining identity consistency in complex narrative environments using only a small amount of data lubrication.
Score: 10.474377498273205
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose EditIDv2, a tuning-free solution specifically designed for high-complexity narrative scenes and long text inputs. Existing character editing methods perform well under simple prompts, but often suffer from degraded editing capabilities, semantic understanding biases, and identity consistency breakdowns when faced with long text narratives containing multiple semantic layers, temporal logic, and complex contextual relationships. In EditID, we analyzed the impact of the ID integration module on editability. In EditIDv2, we further explore and address the influence of the ID feature integration module. The core of EditIDv2 is to discuss the issue of editability injection under minimal data lubrication. Through a sophisticated decomposition of PerceiverAttention, the introduction of ID loss and joint dynamic training with the diffusion model, as well as an offline fusion strategy for the integration module, we achieve deep, multi-level semantic editing while maintaining identity consistency in complex narrative environments using only a small amount of data lubrication. This meets the demands of long prompts and high-quality image generation, and achieves excellent results in the IBench evaluation.

Related papers

Model Editing for New Document Integration in Generative Information Retrieval [110.90609826290968]
Generative retrieval (GR) reformulates the Information Retrieval (IR) task as the generation of document identifiers (docIDs)<n>Existing GR models exhibit poor generalization to newly added documents, often failing to generate the correct docIDs.<n>We propose DOME, a novel method that effectively and efficiently adapts GR models to unseen documents.
arXiv Detail & Related papers (2026-03-03T09:13:38Z)
Optimizing ID Consistency in Multimodal Large Models: Facial Restoration via Alignment, Entanglement, and Disentanglement [54.199726425201895]
Multimodal editing large models have demonstrated powerful editing capabilities across diverse tasks.<n>Current facial ID preservation methods struggle to achieve consistent restoration of both facial identity and edited element IP.<n>We propose EditedID, an Alignment-Disentanglement-Entanglement framework for robust identity-specific facial restoration.
arXiv Detail & Related papers (2026-02-21T08:24:42Z)
FlexID: Training-Free Flexible Identity Injection via Intent-Aware Modulation for Text-to-Image Generation [10.474377498273205]
We propose FlexID, a training-free framework utilizing intent-aware modulation.<n>We introduce a Context-Aware Adaptive Gating (CAG) mechanism that dynamically modulates the weights of these streams.<n>Experiments on IBench demonstrate that FlexID achieves a balance between identity consistency and text adherence.
arXiv Detail & Related papers (2026-02-07T13:59:54Z)
Consistency-Aware Editing for Entity-level Unlearning in Language Models [53.522931419965424]
We introduce a novel consistency-aware editing (CAE) framework for entity-level unlearning.<n>CAE aggregates a diverse set of prompts related to a target entity, including its attributes, relations, and adversarial paraphrases.<n>It then jointly learns a low-rank update guided by a consistency regularizer that aligns the editing directions across prompts.
arXiv Detail & Related papers (2025-12-19T15:18:07Z)
Zero-shot Face Editing via ID-Attribute Decoupled Inversion [5.695436409400152]
We propose a zero-shot face editing method based on ID-Attribute Decoupled Inversion.<n>We decompose the face representation into ID and attribute features, using them as joint conditions to guide both the inversion and the reverse diffusion processes.<n>Our method supports a wide range of complex multi-attribute face editing tasks using only text prompts, without requiring region-specific input, and operates at a speed comparable to DDIM inversion.
arXiv Detail & Related papers (2025-10-13T06:34:40Z)
ID-EA: Identity-driven Text Enhancement and Adaptation with Textual Inversion for Personalized Text-to-Image Generation [33.84646269805187]
ID-EA is a novel framework that guides text embeddings to align with visual identity embeddings.<n> ID-EA substantially outperforms state-of-the-art methods in identity preservation metrics.<n>It generates personalized portraits 15 times faster than existing approaches.
arXiv Detail & Related papers (2025-07-16T07:42:02Z)
InComeS: Integrating Compression and Selection Mechanisms into LLMs for Efficient Model Editing [77.47790551485721]
In-context learning is a promising editing method by comprehending edit information through context encoding.<n>This method is constrained by the limited context window of large language models.<n>We propose InComeS, a flexible framework that enhances LLMs' ability to process editing contexts.
arXiv Detail & Related papers (2025-05-28T09:20:18Z)
QID: Efficient Query-Informed ViTs in Data-Scarce Regimes for OCR-free Visual Document Understanding [53.69841526266547]
Fine-tuning a pre-trained Vision-Language Model with new datasets often falls short in optimizing the vision encoder.<n>We introduce QID, a novel, streamlined, architecture-preserving approach that integrates query embeddings into the vision encoder.
arXiv Detail & Related papers (2025-04-03T18:47:16Z)
EditID: Training-Free Editable ID Customization for Text-to-Image Generation [12.168520751389622]
We propose EditID, a training-free approach based on the DiT architecture, which achieves highly editable customized IDs for text to image generation.<n>It is challenging to alter facial orientation, character attributes, and other features through prompts.<n> EditID is the first text-to-image solution to propose customizable ID editability on the DiT architecture.
arXiv Detail & Related papers (2025-03-16T14:41:30Z)
FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing [22.308638156328968]
DDIM latent, crucial for retaining the original image's key features and layout, significantly contribute to limitations. We introduce FlexiEdit, which enhances fidelity to input text prompts by refining DDIM latent. Our approach represents notable progress in image editing, particularly in performing complex non-rigid edits.
arXiv Detail & Related papers (2024-07-25T08:07:40Z)
CustAny: Customizing Anything from A Single Example [73.90939022698399]
We present a novel pipeline to construct a large dataset of general objects, featuring 315k text-image samples across 10k categories. With the help of MC-IDC, we introduce Customizing Anything (CustAny), a zero-shot framework that maintains ID fidelity and supports flexible text editing for general objects. Our contributions include a large-scale dataset, the CustAny framework and novel ID processing to advance this field.
arXiv Detail & Related papers (2024-06-17T15:26:22Z)
Text Editing by Command [82.50904226312451]
A prevailing paradigm in neural text generation is one-shot generation, where text is produced in a single step. We address this limitation with an interactive text generation setting in which the user interacts with the system by issuing commands to edit existing text. We show that our Interactive Editor, a transformer-based model trained on this dataset, outperforms baselines and obtains positive results in both automatic and human evaluations.
arXiv Detail & Related papers (2020-10-24T08:00:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.