Related papers: Image is All You Need to Empower Large-scale Diffusion Models for In-Domain Generation

Image is All You Need to Empower Large-scale Diffusion Models for In-Domain Generation

URL: http://arxiv.org/abs/2312.08195v2
Date: Thu, 20 Mar 2025 04:15:45 GMT
Title: Image is All You Need to Empower Large-scale Diffusion Models for In-Domain Generation
Authors: Pu Cao, Feng Zhou, Lu Yang, Tianrui Huang, Qing Song,
Abstract summary: In-domain generation aims to perform a variety of tasks within a specific domain, such as unconditional generation, text-to-image, image editing, 3D generation, and more.<n>Early research typically required training specialized generators for each unique task and domain, often relying on fully-labeled data.<n>Motivated by the powerful generative capabilities and broad applications of diffusion models, we are driven to explore leveraging label-free data to empower these models for in-domain generation.
Score: 7.1629002695210024
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In-domain generation aims to perform a variety of tasks within a specific domain, such as unconditional generation, text-to-image, image editing, 3D generation, and more. Early research typically required training specialized generators for each unique task and domain, often relying on fully-labeled data. Motivated by the powerful generative capabilities and broad applications of diffusion models, we are driven to explore leveraging label-free data to empower these models for in-domain generation. Fine-tuning a pre-trained generative model on domain data is an intuitive but challenging way and often requires complex manual hyper-parameter adjustments since the limited diversity of the training data can easily disrupt the model's original generative capabilities. To address this challenge, we propose a guidance-decoupled prior preservation mechanism to achieve high generative quality and controllability by image-only data, inspired by preserving the pre-trained model from a denoising guidance perspective. We decouple domain-related guidance from the conditional guidance used in classifier-free guidance mechanisms to preserve open-world control guidance and unconditional guidance from the pre-trained model. We further propose an efficient domain knowledge learning technique to train an additional text-free UNet copy to predict domain guidance. Besides, we theoretically illustrate a multi-guidance in-domain generation pipeline for a variety of generative tasks, leveraging multiple guidances from distinct diffusion models and conditions. Extensive experiments demonstrate the superiority of our method in domain-specific synthesis and its compatibility with various diffusion-based control methods and applications.

Related papers

Domain Guidance: A Simple Transfer Approach for a Pre-trained Diffusion Model [62.11981915549919]
Domain Guidance is a transfer approach that leverages pre-trained knowledge to guide the sampling process toward the target domain. We demonstrate its substantial effectiveness across various transfer benchmarks, achieving over a 19.6% improvement in FID and a 23.4% improvement in FD$_textDINOv2$ compared to standard fine-tuning.
arXiv Detail & Related papers (2025-04-02T09:07:55Z)
Object Style Diffusion for Generalized Object Detection in Urban Scene [69.04189353993907]
We introduce a novel single-domain object detection generalization method, named GoDiff. By integrating pseudo-target domain data with source domain data, we diversify the training dataset. Experimental results demonstrate that our method not only enhances the generalization ability of existing detectors but also functions as a plug-and-play enhancement for other single-domain generalization methods.
arXiv Detail & Related papers (2024-12-18T13:03:00Z)
How to Continually Adapt Text-to-Image Diffusion Models for Flexible Customization? [91.49559116493414]
We propose a novel Concept-Incremental text-to-image Diffusion Model (CIDM) It can resolve catastrophic forgetting and concept neglect to learn new customization tasks in a concept-incremental manner. Experiments validate that our CIDM surpasses existing custom diffusion models.
arXiv Detail & Related papers (2024-10-23T06:47:29Z)
Concept Conductor: Orchestrating Multiple Personalized Concepts in Text-to-Image Synthesis [14.21719970175159]
Concept Conductor is designed to ensure visual fidelity and correct layout in multi-concept customization. We present a concept injection technique that employs shape-aware masks to specify the generation area for each concept. Our method supports the combination of any number of concepts and maintains high fidelity even when dealing with visually similar concepts.
arXiv Detail & Related papers (2024-08-07T08:43:58Z)
ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction [20.43411883845885]
We introduce a novel task named Unsupervised Concept Extraction (UCE) that considers an unsupervised setting without any human knowledge of the concepts. Given an image that contains multiple concepts, the task aims to extract and recreate individual concepts solely relying on the existing knowledge from pretrained diffusion models. We present ConceptExpress that tackles UCE by unleashing the inherent capabilities of pretrained diffusion models in two aspects.
arXiv Detail & Related papers (2024-07-09T17:50:28Z)
Regularized Training with Generated Datasets for Name-Only Transfer of Vision-Language Models [36.59260354292177]
Recent advancements in text-to-image generation have inspired researchers to generate datasets tailored for perception models using generative models. We aim to fine-tune vision-language models to a specific classification model without access to any real images. Despite the high fidelity of generated images, we observed a significant performance degradation when fine-tuning the model using the generated datasets.
arXiv Detail & Related papers (2024-06-08T10:43:49Z)
Improving Intervention Efficacy via Concept Realignment in Concept Bottleneck Models [57.86303579812877]
Concept Bottleneck Models (CBMs) ground image classification on human-understandable concepts to allow for interpretable model decisions. Existing approaches often require numerous human interventions per image to achieve strong performances. We introduce a trainable concept realignment intervention module, which leverages concept relations to realign concept assignments post-intervention.
arXiv Detail & Related papers (2024-05-02T17:59:01Z)
Infusion: Preventing Customized Text-to-Image Diffusion from Overfitting [51.606819347636076]
We analyze concept-agnostic overfitting, which undermines non-customized concept knowledge, and concept-specific overfitting, which is confined to customize on limited modalities. We propose Infusion, a T2I customization method that enables the learning of target concepts to avoid being constrained by limited training modalities.
arXiv Detail & Related papers (2024-04-22T09:16:25Z)
MC$^2$: Multi-concept Guidance for Customized Multi-concept Generation [49.935634230341904]
We introduce the Multi-concept guidance for Multi-concept customization, termed MC$2$, for improved flexibility and fidelity. MC$2$ decouples the requirements for model architecture via inference time optimization. It adaptively refines the attention weights between visual and textual tokens, directing image regions to focus on their associated words.
arXiv Detail & Related papers (2024-04-08T07:59:04Z)
Multi-BERT: Leveraging Adapters and Prompt Tuning for Low-Resource Multi-Domain Adaptation [14.211024633768986]
The rapid expansion of texts' volume and diversity presents formidable challenges in multi-domain settings. Traditional approaches, either employing a unified model for multiple domains or individual models for each domain, frequently pose significant limitations. This paper introduces a novel approach composed of one core model with multiple sets of domain-specific parameters.
arXiv Detail & Related papers (2024-04-02T22:15:48Z)
LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models [33.379758040084894]
Multi-concept customization emerges as the challenging task within this domain. Existing approaches often rely on training a fusion matrix of multiple Low-Rank Adaptations (LoRAs) to merge various concepts into a single image. LoRA-Composer is a training-free framework designed for seamlessly integrating multiple LoRAs.
arXiv Detail & Related papers (2024-03-18T09:58:52Z)
HCVP: Leveraging Hierarchical Contrastive Visual Prompt for Domain Generalization [69.33162366130887]
Domain Generalization (DG) endeavors to create machine learning models that excel in unseen scenarios by learning invariant features. We introduce a novel method designed to supplement the model with domain-level and task-specific characteristics. This approach aims to guide the model in more effectively separating invariant features from specific characteristics, thereby boosting the generalization.
arXiv Detail & Related papers (2024-01-18T04:23:21Z)
DomainStudio: Fine-Tuning Diffusion Models for Domain-Driven Image Generation using Limited Data [20.998032566820907]
This paper proposes a novel DomainStudio approach to adapt DDPMs pre-trained on large-scale source datasets to target domains using limited data. It is designed to keep the diversity of subjects provided by source domains and get high-quality and diverse adapted samples in target domains.
arXiv Detail & Related papers (2023-06-25T07:40:39Z)
Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models [72.67967883658957]
Public large-scale text-to-image diffusion models can be easily customized for new concepts using low-rank adaptations (LoRAs) The utilization of multiple concept LoRAs to jointly support multiple customized concepts presents a challenge. We propose a new framework called Mix-of-Show that addresses the challenges of decentralized multi-concept customization.
arXiv Detail & Related papers (2023-05-29T17:58:16Z)
Pre-training Contextualized World Models with In-the-wild Videos for Reinforcement Learning [54.67880602409801]
In this paper, we study the problem of pre-training world models with abundant in-the-wild videos for efficient learning of visual control tasks. We introduce Contextualized World Models (ContextWM) that explicitly separate context and dynamics modeling. Our experiments show that in-the-wild video pre-training equipped with ContextWM can significantly improve the sample efficiency of model-based reinforcement learning.
arXiv Detail & Related papers (2023-05-29T14:29:12Z)
On Balancing Bias and Variance in Unsupervised Multi-Source-Free Domain Adaptation [6.2200089460762085]
Methods for multi-source-free domain adaptation (MSFDA) typically train a target model using pseudo-labeled data produced by the source models. We develop an information-theoretic bound on the generalization error of the resulting target model. We then provide insights on how to balance this trade-off from three perspectives, including domain aggregation, selective pseudo-labeling, and joint feature alignment.
arXiv Detail & Related papers (2022-02-01T22:34:18Z)
A Novel Mix-normalization Method for Generalizable Multi-source Person Re-identification [49.548815417844786]
Person re-identification (Re-ID) has achieved great success in the supervised scenario. It is difficult to directly transfer the supervised model to arbitrary unseen domains due to the model overfitting to the seen source domains. We propose MixNorm, which consists of domain-aware mix-normalization (DMN) and domain-ware center regularization (DCR)
arXiv Detail & Related papers (2022-01-24T18:09:38Z)
Unsupervised Learning of Compositional Energy Concepts [70.11673173291426]
We propose COMET, which discovers and represents concepts as separate energy functions. Comet represents both global concepts as well as objects under a unified framework.
arXiv Detail & Related papers (2021-11-04T17:46:12Z)
Learning to Generate Novel Domains for Domain Generalization [115.21519842245752]
This paper focuses on the task of learning from multiple source domains a model that generalizes well to unseen domains. We employ a data generator to synthesize data from pseudo-novel domains to augment the source domains. Our method, L2A-OT, outperforms current state-of-the-art DG methods on four benchmark datasets.
arXiv Detail & Related papers (2020-07-07T09:34:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.