ConceptPrism: Concept Disentanglement in Personalized Diffusion Models via Residual Token Optimization
- URL: http://arxiv.org/abs/2602.19575v1
- Date: Mon, 23 Feb 2026 07:46:19 GMT
- Title: ConceptPrism: Concept Disentanglement in Personalized Diffusion Models via Residual Token Optimization
- Authors: Minseo Kim, Minchan Kwon, Dongyeun Lee, Yunho Jeon, Junmo Kim,
- Abstract summary: ConceptPrism is a novel framework that automatically disentangles the shared visual concept from image-specific residuals.<n>In experiments, ConceptPrism effectively resolves concept entanglement, achieving a significantly improved trade-off between fidelity and alignment.
- Score: 11.472088067393074
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Personalized text-to-image generation suffers from concept entanglement, where irrelevant residual information from reference images is captured, leading to a trade-off between concept fidelity and text alignment. Recent disentanglement approaches attempt to solve this utilizing manual guidance, such as linguistic cues or segmentation masks, which limits their applicability and fails to fully articulate the target concept. In this paper, we propose ConceptPrism, a novel framework that automatically disentangles the shared visual concept from image-specific residuals by comparing images within a set. Our method jointly optimizes a target token and image-wise residual tokens using two complementary objectives: a reconstruction loss to ensure fidelity, and a novel exclusion loss that compels residual tokens to discard the shared concept. This process allows the target token to capture the pure concept without direct supervision. Extensive experiments demonstrate that ConceptPrism effectively resolves concept entanglement, achieving a significantly improved trade-off between fidelity and alignment.
Related papers
- Enhancing Concept Localization in CLIP-based Concept Bottleneck Models [11.592826680892367]
We show that Concept Bottleneck Models (CBMs) do not require explicit concept annotations, relying instead on concepts extracted using CLIP in a zero-shot manner.<n>We introduce Concept Hallucination Inhibition via Localized Interpretability (CHILI), a technique that disentangles image embeddings and localizes pixels corresponding to target concepts.
arXiv Detail & Related papers (2025-10-08T15:07:16Z) - Comparison Reveals Commonality: Customized Image Generation through Contrastive Inversion [22.481176245267328]
We propose Contrastive Inversion, a novel approach that identifies the common concept by comparing the input images without relying on additional information.<n>We train the target token along with the image-wise auxiliary text tokens via contrastive learning, which extracts the well-disentangled true semantics of the target.
arXiv Detail & Related papers (2025-08-11T08:36:29Z) - ACE: Attentional Concept Erasure in Diffusion Models [0.0]
Attentional Concept Erasure integrates a closed-form attention manipulation with lightweight fine-tuning.<n>We show that ACE achieves state-of-the-art concept removal efficacy and robustness.<n>Compared to prior methods, ACE better balances generality (erasing concept and related terms) and specificity (preserving unrelated content)
arXiv Detail & Related papers (2025-04-16T08:16:28Z) - Fine-Grained Erasure in Text-to-Image Diffusion-based Foundation Models [56.35484513848296]
FADE (Fine grained Attenuation for Diffusion Erasure) is an adjacency-aware unlearning algorithm for text-to-image generative models.<n>It removes target concepts with minimal impact on correlated concepts, achieving a 12% improvement in retention performance over state-of-the-art methods.
arXiv Detail & Related papers (2025-03-25T15:49:48Z) - OmniPrism: Learning Disentangled Visual Concept for Image Generation [57.21097864811521]
Creative visual concept generation often draws inspiration from specific concepts in a reference image to produce relevant outcomes.<n>We propose OmniPrism, a visual concept disentangling approach for creative image generation.<n>Our method learns disentangled concept representations guided by natural language and trains a diffusion model to incorporate these concepts.
arXiv Detail & Related papers (2024-12-16T18:59:52Z) - Concept Conductor: Orchestrating Multiple Personalized Concepts in Text-to-Image Synthesis [14.21719970175159]
Concept Conductor is designed to ensure visual fidelity and correct layout in multi-concept customization.
We present a concept injection technique that employs shape-aware masks to specify the generation area for each concept.
Our method supports the combination of any number of concepts and maintains high fidelity even when dealing with visually similar concepts.
arXiv Detail & Related papers (2024-08-07T08:43:58Z) - ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction [20.43411883845885]
We introduce a novel task named Unsupervised Concept Extraction (UCE) that considers an unsupervised setting without any human knowledge of the concepts.
Given an image that contains multiple concepts, the task aims to extract and recreate individual concepts solely relying on the existing knowledge from pretrained diffusion models.
We present ConceptExpress that tackles UCE by unleashing the inherent capabilities of pretrained diffusion models in two aspects.
arXiv Detail & Related papers (2024-07-09T17:50:28Z) - Non-confusing Generation of Customized Concepts in Diffusion Models [135.4385383284657]
We tackle the common challenge of inter-concept visual confusion in compositional concept generation using text-guided diffusion models (TGDMs)
Existing customized generation methods only focus on fine-tuning the second stage while overlooking the first one.
We propose a simple yet effective solution called CLIF: contrastive image-language fine-tuning.
arXiv Detail & Related papers (2024-05-11T05:01:53Z) - Implicit Concept Removal of Diffusion Models [92.55152501707995]
Text-to-image (T2I) diffusion models often inadvertently generate unwanted concepts such as watermarks and unsafe images.
We present the Geom-Erasing, a novel concept removal method based on the geometric-driven control.
arXiv Detail & Related papers (2023-10-09T17:13:10Z) - Break-A-Scene: Extracting Multiple Concepts from a Single Image [80.47666266017207]
We introduce the task of textual scene decomposition.
We propose augmenting the input image with masks that indicate the presence of target concepts.
We then present a novel two-phase customization process.
arXiv Detail & Related papers (2023-05-25T17:59:04Z) - Visual Concepts Tokenization [65.61987357146997]
We propose an unsupervised transformer-based Visual Concepts Tokenization framework, dubbed VCT, to perceive an image into a set of disentangled visual concept tokens.
To obtain these concept tokens, we only use cross-attention to extract visual information from the image tokens layer by layer without self-attention between concept tokens.
We further propose a Concept Disentangling Loss to facilitate that different concept tokens represent independent visual concepts.
arXiv Detail & Related papers (2022-05-20T11:25:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.