Related papers: Infusion: Preventing Customized Text-to-Image Diffusion from Overfitting

Infusion: Preventing Customized Text-to-Image Diffusion from Overfitting

URL: http://arxiv.org/abs/2404.14007v1
Date: Mon, 22 Apr 2024 09:16:25 GMT
Title: Infusion: Preventing Customized Text-to-Image Diffusion from Overfitting
Authors: Weili Zeng, Yichao Yan, Qi Zhu, Zhuo Chen, Pengzhi Chu, Weiming Zhao, Xiaokang Yang,
Abstract summary: We analyze concept-agnostic overfitting, which undermines non-customized concept knowledge, and concept-specific overfitting, which is confined to customize on limited modalities. We propose Infusion, a T2I customization method that enables the learning of target concepts to avoid being constrained by limited training modalities.
Score: 51.606819347636076
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Text-to-image (T2I) customization aims to create images that embody specific visual concepts delineated in textual descriptions. However, existing works still face a main challenge, concept overfitting. To tackle this challenge, we first analyze overfitting, categorizing it into concept-agnostic overfitting, which undermines non-customized concept knowledge, and concept-specific overfitting, which is confined to customize on limited modalities, i.e, backgrounds, layouts, styles. To evaluate the overfitting degree, we further introduce two metrics, i.e, Latent Fisher divergence and Wasserstein metric to measure the distribution changes of non-customized and customized concept respectively. Drawing from the analysis, we propose Infusion, a T2I customization method that enables the learning of target concepts to avoid being constrained by limited training modalities, while preserving non-customized knowledge. Remarkably, Infusion achieves this feat with remarkable efficiency, requiring a mere 11KB of trained parameters. Extensive experiments also demonstrate that our approach outperforms state-of-the-art methods in both single and multi-concept customized generation.

Related papers

Mod-Adapter: Tuning-Free and Versatile Multi-concept Personalization via Modulation Adapter [52.08332620725473]
We propose a tuning-free method for multi-concept personalization that can effectively customize both object and abstract concepts without test-time fine-tuning.<n>Our method achieves state-of-the-art performance in multi-concept personalization, supported by quantitative, qualitative, and human evaluations.
arXiv Detail & Related papers (2025-05-24T09:21:32Z)
Fine-Grained Erasure in Text-to-Image Diffusion-based Foundation Models [56.35484513848296]
FADE (Fine grained Attenuation for Diffusion Erasure) is an adjacency-aware unlearning algorithm for text-to-image generative models. It removes target concepts with minimal impact on correlated concepts, achieving a 12% improvement in retention performance over state-of-the-art methods.
arXiv Detail & Related papers (2025-03-25T15:49:48Z)
ConceptGuard: Continual Personalized Text-to-Image Generation with Forgetting and Confusion Mitigation [3.7816957214446103]
ConceptGuard is a comprehensive approach that combines shift embedding, concept-binding prompts and memory preservation regularization. We show that our approach outperforms all the baseline methods consistently and significantly in both quantitative and qualitative analyses.
arXiv Detail & Related papers (2025-03-13T13:39:24Z)
Modular Customization of Diffusion Models via Blockwise-Parameterized Low-Rank Adaptation [73.16975077770765]
Modular customization is essential for applications like concept stylization and multi-concept customization. Instant merging methods often cause identity loss and interference of individual merged concepts. We propose BlockLoRA, an instant merging method designed to efficiently combine multiple concepts while accurately preserving individual concepts' identity.
arXiv Detail & Related papers (2025-03-11T16:10:36Z)
FlipConcept: Tuning-Free Multi-Concept Personalization for Text-to-Image Generation [26.585985828583304]
We propose FlipConcept, a novel approach that seamlessly integrates multiple personalized concepts into a single image.<n>We introduce guided appearance attention, mask-guided noise mixing, and background dilution to minimize concept leakage.<n>Despite not requiring tuning, our method outperforms existing models in both single and multiple personalized concept inference.
arXiv Detail & Related papers (2025-02-21T04:37:18Z)
MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models [51.1034358143232]
We introduce component-controllable personalization, a novel task that pushes the boundaries of text-to-image (T2I) models. To overcome these challenges, we design MagicTailor, an innovative framework that leverages Dynamic Masked Degradation (DM-Deg) to dynamically perturb undesired visual semantics.
arXiv Detail & Related papers (2024-10-17T09:22:53Z)
CusConcept: Customized Visual Concept Decomposition with Diffusion Models [13.95568624067449]
We propose a two-stage framework, CusConcept, to extract customized visual concept embedding vectors. In the first stage, CusConcept employs a vocabularies-guided concept decomposition mechanism. In the second stage, joint concept refinement is performed to enhance the fidelity and quality of generated images.
arXiv Detail & Related papers (2024-10-01T04:41:44Z)
Concept Conductor: Orchestrating Multiple Personalized Concepts in Text-to-Image Synthesis [14.21719970175159]
Concept Conductor is designed to ensure visual fidelity and correct layout in multi-concept customization. We present a concept injection technique that employs shape-aware masks to specify the generation area for each concept. Our method supports the combination of any number of concepts and maintains high fidelity even when dealing with visually similar concepts.
arXiv Detail & Related papers (2024-08-07T08:43:58Z)
Six-CD: Benchmarking Concept Removals for Benign Text-to-image Diffusion Models [58.74606272936636]
Text-to-image (T2I) diffusion models have shown exceptional capabilities in generating images that closely correspond to textual prompts. The models could be exploited for malicious purposes, such as generating images with violence or nudity, or creating unauthorized portraits of public figures in inappropriate contexts. concept removal methods have been proposed to modify diffusion models to prevent the generation of malicious and unwanted concepts.
arXiv Detail & Related papers (2024-06-21T03:58:44Z)
AttenCraft: Attention-guided Disentanglement of Multiple Concepts for Text-to-Image Customization [4.544788024283586]
AttenCraft is an attention-guided method for multiple concept disentanglement. We introduce Uniform sampling and Reweighted sampling schemes to alleviate the non-synchronicity of feature acquisition from different concepts. Our method outperforms baseline models in terms of image-alignment, and behaves comparably on text-alignment.
arXiv Detail & Related papers (2024-05-28T08:50:14Z)
Unlearning Concepts in Diffusion Model via Concept Domain Correction and Concept Preserving Gradient [20.698305103879232]
We propose a novel concept domain correction framework named textbfDoCo (textbfDomaintextbfCorrection) By aligning the output domains of sensitive and anchor concepts through adversarial training, our approach ensures comprehensive unlearning of target concepts. We also introduce a concept-preserving gradient surgery technique that mitigates conflicting gradient components, thereby preserving the model's utility while unlearning specific concepts.
arXiv Detail & Related papers (2024-05-24T07:47:36Z)
Non-confusing Generation of Customized Concepts in Diffusion Models [135.4385383284657]
We tackle the common challenge of inter-concept visual confusion in compositional concept generation using text-guided diffusion models (TGDMs) Existing customized generation methods only focus on fine-tuning the second stage while overlooking the first one. We propose a simple yet effective solution called CLIF: contrastive image-language fine-tuning.
arXiv Detail & Related papers (2024-05-11T05:01:53Z)
Implicit Concept Removal of Diffusion Models [92.55152501707995]
Text-to-image (T2I) diffusion models often inadvertently generate unwanted concepts such as watermarks and unsafe images. We present the Geom-Erasing, a novel concept removal method based on the geometric-driven control.
arXiv Detail & Related papers (2023-10-09T17:13:10Z)
Designing an Encoder for Fast Personalization of Text-to-Image Models [57.62449900121022]
We propose an encoder-based domain-tuning approach for text-to-image personalization. We employ two components: First, an encoder that takes as an input a single image of a target concept from a given domain. Second, a set of regularized weight-offsets for the text-to-image model that learn how to effectively ingest additional concepts.
arXiv Detail & Related papers (2023-02-23T18:46:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.