DreamCache: Finetuning-Free Lightweight Personalized Image Generation via Feature Caching
- URL: http://arxiv.org/abs/2411.17786v1
- Date: Tue, 26 Nov 2024 15:03:14 GMT
- Title: DreamCache: Finetuning-Free Lightweight Personalized Image Generation via Feature Caching
- Authors: Emanuele Aiello, Umberto Michieli, Diego Valsesia, Mete Ozay, Enrico Magli,
- Abstract summary: We introduce DreamCache, a scalable approach for efficient and high-quality personalized image generation.
DreamCache achieves state-of-the-art image and text alignment, utilizing an order of magnitude fewer extra parameters.
- Score: 38.46235896192237
- License:
- Abstract: Personalized image generation requires text-to-image generative models that capture the core features of a reference subject to allow for controlled generation across different contexts. Existing methods face challenges due to complex training requirements, high inference costs, limited flexibility, or a combination of these issues. In this paper, we introduce DreamCache, a scalable approach for efficient and high-quality personalized image generation. By caching a small number of reference image features from a subset of layers and a single timestep of the pretrained diffusion denoiser, DreamCache enables dynamic modulation of the generated image features through lightweight, trained conditioning adapters. DreamCache achieves state-of-the-art image and text alignment, utilizing an order of magnitude fewer extra parameters, and is both more computationally effective and versatile than existing models.
Related papers
- Hollowed Net for On-Device Personalization of Text-to-Image Diffusion Models [51.3915762595891]
This paper presents an efficient LoRA-based personalization approach for on-device subject-driven generation.
Our method, termed Hollowed Net, enhances memory efficiency during fine-tuning by modifying the architecture of a diffusion U-Net.
arXiv Detail & Related papers (2024-11-02T08:42:48Z) - JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation [49.997839600988875]
Existing personalization methods rely on finetuning a text-to-image foundation model on a user's custom dataset.
We propose Joint-Image Diffusion (jedi), an effective technique for learning a finetuning-free personalization model.
Our model achieves state-of-the-art generation quality, both quantitatively and qualitatively, significantly outperforming both the prior finetuning-based and finetuning-free personalization baselines.
arXiv Detail & Related papers (2024-07-08T17:59:02Z) - Ada-adapter:Fast Few-shot Style Personlization of Diffusion Model with Pre-trained Image Encoder [57.574544285878794]
Ada-Adapter is a novel framework for few-shot style personalization of diffusion models.
Our method enables efficient zero-shot style transfer utilizing a single reference image.
We demonstrate the effectiveness of our approach on various artistic styles, including flat art, 3D rendering, and logo design.
arXiv Detail & Related papers (2024-07-08T02:00:17Z) - MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models [34.611309081801345]
Large diffusion-based Text-to-Image (T2I) models have shown impressive generative powers for text-to-image generation.
In this paper, we propose a novel strategy to scale a generative model across new tasks with minimal compute.
arXiv Detail & Related papers (2024-04-15T17:55:56Z) - Pixel-Aware Stable Diffusion for Realistic Image Super-resolution and Personalized Stylization [23.723573179119228]
We propose a pixel-aware stable diffusion (PASD) network to achieve robust Real-ISR and personalized image stylization.
A pixel-aware cross attention module is introduced to enable diffusion models perceiving image local structures in pixel-wise level.
An adjustable noise schedule is introduced to further improve the image restoration results.
arXiv Detail & Related papers (2023-08-28T10:15:57Z) - Conditional Generation from Unconditional Diffusion Models using
Denoiser Representations [94.04631421741986]
We propose adapting pre-trained unconditional diffusion models to new conditions using the learned internal representations of the denoiser network.
We show that augmenting the Tiny ImageNet training set with synthetic images generated by our approach improves the classification accuracy of ResNet baselines by up to 8%.
arXiv Detail & Related papers (2023-06-02T20:09:57Z) - FastComposer: Tuning-Free Multi-Subject Image Generation with Localized
Attention [37.58569261714206]
Diffusion models excel at text-to-image generation, especially in subject-driven generation for personalized images.
FastComposer enables efficient, personalized, multi-subject text-to-image generation without fine-tuning.
arXiv Detail & Related papers (2023-05-17T17:59:55Z) - SVDiff: Compact Parameter Space for Diffusion Fine-Tuning [19.978410014103435]
We propose a novel approach to address limitations in existing text-to-image diffusion models for personalization.
Our method involves fine-tuning the singular values of the weight matrices, leading to a compact and efficient parameter space.
We also propose a Cut-Mix-Unmix data-augmentation technique to enhance the quality of multi-subject image generation and a simple text-based image editing framework.
arXiv Detail & Related papers (2023-03-20T17:45:02Z) - Uncovering the Disentanglement Capability in Text-to-Image Diffusion
Models [60.63556257324894]
A key desired property of image generative models is the ability to disentangle different attributes.
We propose a simple, light-weight image editing algorithm where the mixing weights of the two text embeddings are optimized for style matching and content preservation.
Experiments show that the proposed method can modify a wide range of attributes, with the performance outperforming diffusion-model-based image-editing algorithms.
arXiv Detail & Related papers (2022-12-16T19:58:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.