HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models
- URL: http://arxiv.org/abs/2307.06949v2
- Date: Wed, 16 Oct 2024 23:55:27 GMT
- Title: HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models
- Authors: Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Wei Wei, Tingbo Hou, Yael Pritch, Neal Wadhwa, Michael Rubinstein, Kfir Aberman,
- Abstract summary: HyperDreamBooth is a hypernetwork capable of efficiently generating a small set of personalized weights from a single image.
Our method achieves personalization on faces in roughly 20 seconds, 25x faster than DreamBooth and 125x faster than Textual Inversion.
- Score: 58.39439948383928
- License:
- Abstract: Personalization has emerged as a prominent aspect within the field of generative AI, enabling the synthesis of individuals in diverse contexts and styles, while retaining high-fidelity to their identities. However, the process of personalization presents inherent challenges in terms of time and memory requirements. Fine-tuning each personalized model needs considerable GPU time investment, and storing a personalized model per subject can be demanding in terms of storage capacity. To overcome these challenges, we propose HyperDreamBooth - a hypernetwork capable of efficiently generating a small set of personalized weights from a single image of a person. By composing these weights into the diffusion model, coupled with fast finetuning, HyperDreamBooth can generate a person's face in various contexts and styles, with high subject details while also preserving the model's crucial knowledge of diverse styles and semantic modifications. Our method achieves personalization on faces in roughly 20 seconds, 25x faster than DreamBooth and 125x faster than Textual Inversion, using as few as one reference image, with the same quality and style diversity as DreamBooth. Also our method yields a model that is 10,000x smaller than a normal DreamBooth model. Project page: https://hyperdreambooth.github.io
Related papers
- DreamCache: Finetuning-Free Lightweight Personalized Image Generation via Feature Caching [38.46235896192237]
We introduce DreamCache, a scalable approach for efficient and high-quality personalized image generation.
DreamCache achieves state-of-the-art image and text alignment, utilizing an order of magnitude fewer extra parameters.
arXiv Detail & Related papers (2024-11-26T15:03:14Z) - Imagine yourself: Tuning-Free Personalized Image Generation [39.63411174712078]
We introduce Imagine yourself, a state-of-the-art model designed for personalized image generation.
It operates as a tuning-free model, enabling all users to leverage a shared framework without individualized adjustments.
Our study demonstrates that Imagine yourself surpasses the state-of-the-art personalization model, exhibiting superior capabilities in identity preservation, visual quality, and text alignment.
arXiv Detail & Related papers (2024-09-20T09:21:49Z) - JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation [49.997839600988875]
Existing personalization methods rely on finetuning a text-to-image foundation model on a user's custom dataset.
We propose Joint-Image Diffusion (jedi), an effective technique for learning a finetuning-free personalization model.
Our model achieves state-of-the-art generation quality, both quantitatively and qualitatively, significantly outperforming both the prior finetuning-based and finetuning-free personalization baselines.
arXiv Detail & Related papers (2024-07-08T17:59:02Z) - Emu: Enhancing Image Generation Models Using Photogenic Needles in a
Haystack [75.00066365801993]
Training text-to-image models with web scale image-text pairs enables the generation of a wide range of visual concepts from text.
These pre-trained models often face challenges when it comes to generating highly aesthetic images.
We propose quality-tuning to guide a pre-trained model to exclusively generate highly visually appealing images.
arXiv Detail & Related papers (2023-09-27T17:30:19Z) - Generate Anything Anywhere in Any Scene [25.75076439397536]
We propose a controllable text-to-image diffusion model for personalized object generation.
Our approach demonstrates significant potential for various applications, such as those in art, entertainment, and advertising design.
arXiv Detail & Related papers (2023-06-29T17:55:14Z) - Inserting Anybody in Diffusion Models via Celeb Basis [29.51292196851589]
We propose a new personalization method that allows for the seamless integration of a unique individual into the pre-trained diffusion model.
To achieve this, we first analyze and build a well-defined celeb basis from the embedding space of the pre-trained large text encoder.
Empowered by the proposed celeb basis, the new identity in our customized model showcases a better concept combination ability than previous personalization methods.
arXiv Detail & Related papers (2023-06-01T17:30:24Z) - Subject-driven Text-to-Image Generation via Apprenticeship Learning [83.88256453081607]
We present SuTI, a subject-driven Text-to-Image generator that replaces subject-specific fine tuning with in-context learning.
SuTI is powered by apprenticeship learning, where a single apprentice model is learned from data generated by a massive number of subject-specific expert models.
We show that SuTI significantly outperforms existing models like InstructPix2Pix, Textual Inversion, Imagic, Prompt2Prompt, Re-Imagen and DreamBooth.
arXiv Detail & Related papers (2023-04-01T00:47:35Z) - Designing an Encoder for Fast Personalization of Text-to-Image Models [57.62449900121022]
We propose an encoder-based domain-tuning approach for text-to-image personalization.
We employ two components: First, an encoder that takes as an input a single image of a target concept from a given domain.
Second, a set of regularized weight-offsets for the text-to-image model that learn how to effectively ingest additional concepts.
arXiv Detail & Related papers (2023-02-23T18:46:41Z) - MetaPortrait: Identity-Preserving Talking Head Generation with Fast
Personalized Adaptation [57.060828009199646]
We propose an ID-preserving talking head generation framework.
We claim that dense landmarks are crucial to achieving accurate geometry-aware flow fields.
We adaptively fuse the source identity during synthesis, so that the network better preserves the key characteristics of the image portrait.
arXiv Detail & Related papers (2022-12-15T18:59:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.