Draw Your Mind: Personalized Generation via Condition-Level Modeling in Text-to-Image Diffusion Models
- URL: http://arxiv.org/abs/2508.03481v1
- Date: Tue, 05 Aug 2025 14:14:55 GMT
- Title: Draw Your Mind: Personalized Generation via Condition-Level Modeling in Text-to-Image Diffusion Models
- Authors: Hyungjin Kim, Seokho Ahn, Young-Duk Seo,
- Abstract summary: We propose DrUM, a novel method that integrates user profiling with a transformer-based adapter to enable personalized generation.<n>DrUM demonstrates strong performance on large-scale datasets and seamlessly integrates with open-source text encoders.
- Score: 5.282669911393826
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Personalized generation in T2I diffusion models aims to naturally incorporate individual user preferences into the generation process with minimal user intervention. However, existing studies primarily rely on prompt-level modeling with large-scale models, often leading to inaccurate personalization due to the limited input token capacity of T2I diffusion models. To address these limitations, we propose DrUM, a novel method that integrates user profiling with a transformer-based adapter to enable personalized generation through condition-level modeling in the latent space. DrUM demonstrates strong performance on large-scale datasets and seamlessly integrates with open-source text encoders, making it compatible with widely used foundation T2I models without requiring additional fine-tuning.
Related papers
- Masked Conditioning for Deep Generative Models [0.0]
We introduce a novel masked-conditioning approach that enables generative models to work with sparse, mixed-type data.<n>We show that small models trained on limited data can be coupled with large pretrained foundation models to improve generation quality.
arXiv Detail & Related papers (2025-05-22T14:33:03Z) - DMM: Building a Versatile Image Generation Model via Distillation-Based Model Merging [32.97010533998294]
We introduce a style-promptable image generation pipeline which can accurately generate arbitrary-style images under the control of style vectors.<n>Based on this design, we propose the score distillation based model merging paradigm (DMM), compressing multiple models into a single versatile T2I model.<n>Our experiments demonstrate that DMM can compactly reorganize the knowledge from multiple teacher models and achieve controllable arbitrary-style generation.
arXiv Detail & Related papers (2025-04-16T15:09:45Z) - Differentially Private Adaptation of Diffusion Models via Noisy Aggregated Embeddings [23.687702204151872]
Textual Inversion (TI) learns an embedding vector for an image or set of images, to enable adaptation under differential privacy constraints.<n>We show DPAgg-TI outperforms DP-SGD finetuning in both utility and robustness under the same privacy budget.
arXiv Detail & Related papers (2024-11-22T00:09:49Z) - Structured Pattern Expansion with Diffusion Models [6.726377308248659]
Recent advances in diffusion models have significantly improved the synthesis of materials, textures, and 3D shapes.
In this paper, we address the synthesis of structured, stationary patterns, where diffusion models are generally less reliable and, more importantly, less controllable.
It enables users to exercise direct control over the synthesis by expanding a partially hand-drawn pattern into a larger design while preserving the structure and details of the input.
arXiv Detail & Related papers (2024-11-12T18:39:23Z) - Minority-Focused Text-to-Image Generation via Prompt Optimization [57.319845580050924]
We investigate the generation of minority samples using pretrained text-to-image (T2I) latent diffusion models.<n>We develop an online prompt optimization framework that encourages emergence of desired properties during inference.<n>We then tailor this generic prompt distributions into a specialized solver that promotes generation of minority features.
arXiv Detail & Related papers (2024-10-10T11:56:09Z) - Diffusion-Based Neural Network Weights Generation [80.89706112736353]
D2NWG is a diffusion-based neural network weights generation technique that efficiently produces high-performing weights for transfer learning.
Our method extends generative hyper-representation learning to recast the latent diffusion paradigm for neural network weights generation.
Our approach is scalable to large architectures such as large language models (LLMs), overcoming the limitations of current parameter generation techniques.
arXiv Detail & Related papers (2024-02-28T08:34:23Z) - Towards Robust and Efficient Cloud-Edge Elastic Model Adaptation via Selective Entropy Distillation [56.79064699832383]
We establish a Cloud-Edge Elastic Model Adaptation (CEMA) paradigm in which the edge models only need to perform forward propagation.
In our CEMA, to reduce the communication burden, we devise two criteria to exclude unnecessary samples from uploading to the cloud.
arXiv Detail & Related papers (2024-02-27T08:47:19Z) - Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation [59.184980778643464]
Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI)
In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion)
Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment.
arXiv Detail & Related papers (2024-02-15T18:59:18Z) - $λ$-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space [61.091910046492345]
$lambda$-ECLIPSE works in the latent space of a pre-trained CLIP model without relying on the diffusion UNet models.
$lambda$-ECLIPSE performs multisubject driven P-T2I with just 34M parameters and is trained on a mere 74 GPU hours.
arXiv Detail & Related papers (2024-02-07T19:07:10Z) - Towards Aligned Layout Generation via Diffusion Model with Aesthetic Constraints [53.66698106829144]
We propose a unified model to handle a broad range of layout generation tasks.
The model is based on continuous diffusion models.
Experiment results show that LACE produces high-quality layouts.
arXiv Detail & Related papers (2024-02-07T11:12:41Z) - I2V-Adapter: A General Image-to-Video Adapter for Diffusion Models [80.32562822058924]
Text-guided image-to-video (I2V) generation aims to generate a coherent video that preserves the identity of the input image.
I2V-Adapter adeptly propagates the unnoised input image to subsequent noised frames through a cross-frame attention mechanism.
Our experimental results demonstrate that I2V-Adapter is capable of producing high-quality videos.
arXiv Detail & Related papers (2023-12-27T19:11:50Z) - When Parameter-efficient Tuning Meets General-purpose Vision-language
Models [65.19127815275307]
PETAL revolutionizes the training process by requiring only 0.5% of the total parameters, achieved through a unique mode approximation technique.
Our experiments reveal that PETAL not only outperforms current state-of-the-art methods in most scenarios but also surpasses full fine-tuning models in effectiveness.
arXiv Detail & Related papers (2023-12-16T17:13:08Z) - Jointly Training Large Autoregressive Multimodal Models [37.32912103934043]
We present the Joint Autoregressive Mixture (JAM) framework, a modular approach that systematically fuses existing text and image generation models.
We also introduce a specialized, data-efficient instruction-tuning strategy, tailored for mixed-modal generation tasks.
Our final instruct-tuned model demonstrates unparalleled performance in generating high-quality multimodal outputs.
arXiv Detail & Related papers (2023-09-27T10:40:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.