FlexID: Training-Free Flexible Identity Injection via Intent-Aware Modulation for Text-to-Image Generation
- URL: http://arxiv.org/abs/2602.07554v1
- Date: Sat, 07 Feb 2026 13:59:54 GMT
- Title: FlexID: Training-Free Flexible Identity Injection via Intent-Aware Modulation for Text-to-Image Generation
- Authors: Guandong Li, Yijun Ding,
- Abstract summary: We propose FlexID, a training-free framework utilizing intent-aware modulation.<n>We introduce a Context-Aware Adaptive Gating (CAG) mechanism that dynamically modulates the weights of these streams.<n>Experiments on IBench demonstrate that FlexID achieves a balance between identity consistency and text adherence.
- Score: 10.474377498273205
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Personalized text-to-image generation aims to seamlessly integrate specific identities into textual descriptions. However, existing training-free methods often rely on rigid visual feature injection, creating a conflict between identity fidelity and textual adaptability. To address this, we propose FlexID, a novel training-free framework utilizing intent-aware modulation. FlexID orthogonally decouples identity into two dimensions: a Semantic Identity Projector (SIP) that injects high-level priors into the language space, and a Visual Feature Anchor (VFA) that ensures structural fidelity within the latent space. Crucially, we introduce a Context-Aware Adaptive Gating (CAG) mechanism that dynamically modulates the weights of these streams based on editing intent and diffusion timesteps. By automatically relaxing rigid visual constraints when strong editing intent is detected, CAG achieves synergy between identity preservation and semantic variation. Extensive experiments on IBench demonstrate that FlexID achieves a state-of-the-art balance between identity consistency and text adherence, offering an efficient solution for complex narrative generation.
Related papers
- Optimizing ID Consistency in Multimodal Large Models: Facial Restoration via Alignment, Entanglement, and Disentanglement [54.199726425201895]
Multimodal editing large models have demonstrated powerful editing capabilities across diverse tasks.<n>Current facial ID preservation methods struggle to achieve consistent restoration of both facial identity and edited element IP.<n>We propose EditedID, an Alignment-Disentanglement-Entanglement framework for robust identity-specific facial restoration.
arXiv Detail & Related papers (2026-02-21T08:24:42Z) - Training for Identity, Inference for Controllability: A Unified Approach to Tuning-Free Face Personalization [16.851646868288135]
We introduce UniID, a unified tuning-free framework that synergistically integrates both paradigms.<n>Our key insight is that when merging these approaches, they should mutually reinforce only identity-relevant information.<n>This principled design enables UniID to achieve high-fidelity face personalization with flexible text controllability.
arXiv Detail & Related papers (2025-12-03T16:57:50Z) - ID-Composer: Multi-Subject Video Synthesis with Hierarchical Identity Preservation [48.59900036213667]
Video generative models pretrained on large-scale datasets can produce high-quality videos, but are often conditioned on text or a single image.<n>We introduce ID-Composer, a novel framework that tackles multi-subject video generation from a text prompt and reference images.
arXiv Detail & Related papers (2025-11-01T11:29:14Z) - Beyond Inference Intervention: Identity-Decoupled Diffusion for Face Anonymization [55.29071072675132]
Face anonymization aims to conceal identity information while preserving non-identity attributes.<n>We propose textbfIDsuperscript2Face, a training-centric anonymization framework.<n>We show that IDtextsuperscript2Face outperforms existing methods in visual quality, identity suppression, and utility preservation.
arXiv Detail & Related papers (2025-10-28T09:28:12Z) - ID-EA: Identity-driven Text Enhancement and Adaptation with Textual Inversion for Personalized Text-to-Image Generation [33.84646269805187]
ID-EA is a novel framework that guides text embeddings to align with visual identity embeddings.<n> ID-EA substantially outperforms state-of-the-art methods in identity preservation metrics.<n>It generates personalized portraits 15 times faster than existing approaches.
arXiv Detail & Related papers (2025-07-16T07:42:02Z) - FOCUS: Unified Vision-Language Modeling for Interactive Editing Driven by Referential Segmentation [55.01077993490845]
Recent Large Vision Language Models (LVLMs) demonstrate promising capabilities in unifying visual understanding and generative modeling.<n>We introduce FOCUS, a unified LVLM that integrates segmentation-aware perception and controllable object-centric generation within an end-to-end framework.
arXiv Detail & Related papers (2025-06-20T07:46:40Z) - Identity-Preserving Text-to-Image Generation via Dual-Level Feature Decoupling and Expert-Guided Fusion [35.67333978414322]
We propose a novel framework that improves the separation of identity-related and identity-unrelated features.<n>Our framework consists of two key components: an Implicit-Explicit foreground-background Decoupling Module and a Feature Fusion Module.
arXiv Detail & Related papers (2025-05-28T13:40:46Z) - See What You Seek: Semantic Contextual Integration for Cloth-Changing Person Re-Identification [14.01260112340177]
Cloth-changing person re-identification (CC-ReID) aims to match individuals across surveillance cameras despite variations in clothing.<n>Existing methods typically mitigate the impact of clothing changes or enhance identity (ID)-relevant features.<n>We propose a novel prompt learning framework Semantic Contextual Integration (SCI) to reduce clothing-induced discrepancies and strengthen ID cues.
arXiv Detail & Related papers (2024-12-02T10:11:16Z) - Content and Salient Semantics Collaboration for Cloth-Changing Person Re-Identification [74.10897798660314]
Cloth-changing person re-identification aims at recognizing the same person with clothing changes across non-overlapping cameras.<n>We propose a unified Semantics Mining and Refinement (SMR) module to extract robust identity-related content and salient semantics, mitigating interference from clothing appearances effectively.<n>Our proposed method achieves state-of-the-art performance on three cloth-changing benchmarks, demonstrating its superiority over advanced competitors.
arXiv Detail & Related papers (2024-05-26T15:17:28Z) - ArtWeaver: Advanced Dynamic Style Integration via Diffusion Model [73.95608242322949]
Stylized Text-to-Image Generation (STIG) aims to generate images from text prompts and style reference images.
We present ArtWeaver, a novel framework that leverages pretrained Stable Diffusion to address challenges such as misinterpreted styles and inconsistent semantics.
arXiv Detail & Related papers (2024-05-24T07:19:40Z) - ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning [57.91881829308395]
Identity-preserving text-to-image generation (ID-T2I) has received significant attention due to its wide range of application scenarios like AI portrait and advertising.
We present textbfID-Aligner, a general feedback learning framework to enhance ID-T2I performance.
arXiv Detail & Related papers (2024-04-23T18:41:56Z) - Infinite-ID: Identity-preserved Personalization via ID-semantics Decoupling Paradigm [31.06269858216316]
We propose Infinite-ID, an ID-semantics decoupling paradigm for identity-preserved personalization.
We introduce an identity-enhanced training, incorporating an additional image cross-attention module to capture sufficient ID information.
We also introduce a feature interaction mechanism that combines a mixed attention module with an AdaIN-mean operation to seamlessly merge the two streams.
arXiv Detail & Related papers (2024-03-18T13:39:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.