Related papers: Personalized Image Generation with Deep Generative Models: A Decade Survey

Personalized Image Generation with Deep Generative Models: A Decade Survey

URL: http://arxiv.org/abs/2502.13081v1
Date: Tue, 18 Feb 2025 17:34:04 GMT
Title: Personalized Image Generation with Deep Generative Models: A Decade Survey
Authors: Yuxiang Wei, Yiheng Zheng, Yabo Zhang, Ming Liu, Zhilong Ji, Lei Zhang, Wangmeng Zuo,
Abstract summary: We present a review of generalized personalized image generation across various generative models.<n>We first define a unified framework that standardizes the personalization process across different generative models.<n>We then provide an in-depth analysis of personalization techniques within each generative model, highlighting their unique contributions and innovations.
Score: 51.26287478042516
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advancements in generative models have significantly facilitated the development of personalized content creation. Given a small set of images with user-specific concept, personalized image generation allows to create images that incorporate the specified concept and adhere to provided text descriptions. Due to its wide applications in content creation, significant effort has been devoted to this field in recent years. Nonetheless, the technologies used for personalization have evolved alongside the development of generative models, with their distinct and interrelated components. In this survey, we present a comprehensive review of generalized personalized image generation across various generative models, including traditional GANs, contemporary text-to-image diffusion models, and emerging multi-model autoregressive models. We first define a unified framework that standardizes the personalization process across different generative models, encompassing three key components, i.e., inversion spaces, inversion methods, and personalization schemes. This unified framework offers a structured approach to dissecting and comparing personalization techniques across different generative architectures. Building upon this unified framework, we further provide an in-depth analysis of personalization techniques within each generative model, highlighting their unique contributions and innovations. Through comparative analysis, this survey elucidates the current landscape of personalized image generation, identifying commonalities and distinguishing features among existing methods. Finally, we discuss the open challenges in the field and propose potential directions for future research. We keep tracing related works at https://github.com/csyxwei/Awesome-Personalized-Image-Generation.

Related papers

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset [140.1967962502411]
We introduce a novel approach that employs a diffusion transformer to generate semantically rich CLIP image features.<n>A sequential pretraining strategy for unified models-first training on image understanding and subsequently on image generation offers practical advantages.<n>Building on our innovative model design, training recipe, and datasets, we develop BLIP3-o, a suite of state-of-the-art unified multimodal models.
arXiv Detail & Related papers (2025-05-14T17:11:07Z)
Mogao: An Omni Foundation Model for Interleaved Multi-Modal Generation [54.588082888166504]
We present Mogao, a unified framework that enables interleaved multi-modal generation through a causal approach.<n>Mogoo integrates a set of key technical improvements in architecture design, including a deep-fusion design, dual vision encoders, interleaved rotary position embeddings, and multi-modal classifier-free guidance.<n>Experiments show that Mogao achieves state-of-the-art performance in multi-modal understanding and text-to-image generation, but also excels in producing high-quality, coherent interleaved outputs.
arXiv Detail & Related papers (2025-05-08T17:58:57Z)
Personalized Text-to-Image Generation with Auto-Regressive Models [17.294962891093373]
This paper investigates the potential of optimizing auto-regressive models for personalized image synthesis. We propose a two-stage training strategy that combines optimization of text embeddings and fine-tuning of transformer layers.
arXiv Detail & Related papers (2025-04-17T17:58:26Z)
Imagine yourself: Tuning-Free Personalized Image Generation [39.63411174712078]
We introduce Imagine yourself, a state-of-the-art model designed for personalized image generation. It operates as a tuning-free model, enabling all users to leverage a shared framework without individualized adjustments. Our study demonstrates that Imagine yourself surpasses the state-of-the-art personalization model, exhibiting superior capabilities in identity preservation, visual quality, and text alignment.
arXiv Detail & Related papers (2024-09-20T09:21:49Z)
JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation [49.997839600988875]
Existing personalization methods rely on finetuning a text-to-image foundation model on a user's custom dataset. We propose Joint-Image Diffusion (jedi), an effective technique for learning a finetuning-free personalization model. Our model achieves state-of-the-art generation quality, both quantitatively and qualitatively, significantly outperforming both the prior finetuning-based and finetuning-free personalization baselines.
arXiv Detail & Related papers (2024-07-08T17:59:02Z)
An Improved Method for Personalizing Diffusion Models [23.20529652769131]
Diffusion models have demonstrated impressive image generation capabilities. Personalized approaches, such as textual inversion and Dreambooth, enhance model individualization using specific images. Our proposed approach aims to retain the model's original knowledge during new information integration.
arXiv Detail & Related papers (2024-07-07T09:52:04Z)
Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation [87.50120181861362]
VisionPrefer is a high-quality and fine-grained preference dataset that captures multiple preference aspects. We train a reward model VP-Score over VisionPrefer to guide the training of text-to-image generative models and the preference prediction accuracy of VP-Score is comparable to human annotators.
arXiv Detail & Related papers (2024-04-23T14:53:15Z)
Pick-and-Draw: Training-free Semantic Guidance for Text-to-Image Personalization [56.12990759116612]
Pick-and-Draw is a training-free semantic guidance approach to boost identity consistency and generative diversity for personalization methods. The proposed approach can be applied to any personalized diffusion models and requires as few as a single reference image.
arXiv Detail & Related papers (2024-01-30T05:56:12Z)
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion [50.59261592343479]
We present Kandinsky1, a novel exploration of latent diffusion architecture. The proposed model is trained separately to map text embeddings to image embeddings of CLIP. We also deployed a user-friendly demo system that supports diverse generative modes such as text-to-image generation, image fusion, text and image fusion, image variations generation, and text-guided inpainting/outpainting.
arXiv Detail & Related papers (2023-10-05T12:29:41Z)
Limitations of Face Image Generation [12.11955119100926]
We study the efficacy and shortcomings of generative models in the context of face generation. We identify several limitations of face image generation that include faithfulness to the text prompt, demographic disparities, and distributional shifts. We present an analytical model that provides insights into how training data selection contributes to the performance of generative models.
arXiv Detail & Related papers (2023-09-13T19:33:26Z)
ProSpect: Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models [77.03361270726944]
Current personalization methods can invert an object or concept into the textual conditioning space and compose new natural sentences for text-to-image diffusion models. We propose a novel approach that leverages the step-by-step generation process of diffusion models, which generate images from low to high frequency information. We apply ProSpect in various personalized attribute-aware image generation applications, such as image-guided or text-driven manipulations of materials, style, and layout.
arXiv Detail & Related papers (2023-05-25T16:32:01Z)
Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models [55.04969603431266]
This paper proposes a method for generating images of customized objects specified by users. The method is based on a general framework that bypasses the lengthy optimization required by previous approaches. We demonstrate through experiments that our proposed method is able to synthesize images with compelling output quality, appearance diversity, and object fidelity.
arXiv Detail & Related papers (2023-04-05T17:59:32Z)
Object-Centric Relational Representations for Image Generation [18.069747511100132]
This paper explores a novel method to condition image generation, based on object-centric relational representations. We show that such architectural biases entail properties that facilitate the manipulation and conditioning of the generative process. We also propose a novel benchmark for image generation consisting of a synthetic dataset of images paired with their relational representation.
arXiv Detail & Related papers (2023-03-26T11:17:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.