Related papers: PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation

PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation

URL: http://arxiv.org/abs/2412.03177v1
Date: Wed, 04 Dec 2024 09:59:43 GMT
Title: PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation
Authors: Qihan Huang, Long Chan, Jinlong Liu, Wanggui He, Hao Jiang, Mingli Song, Jie Song,
Abstract summary: Finetuning-free personalized image generation can synthesize customized images without test-time finetuning.<n>This work proposes PatchDPO that estimates the quality of image patches within each generated image and accordingly trains the model.<n>Experiment results demonstrate that PatchDPO significantly improves the performance of multiple pre-trained personalized generation models.
Score: 34.528256332657406
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Finetuning-free personalized image generation can synthesize customized images without test-time finetuning, attracting wide research interest owing to its high efficiency. Current finetuning-free methods simply adopt a single training stage with a simple image reconstruction task, and they typically generate low-quality images inconsistent with the reference images during test-time. To mitigate this problem, inspired by the recent DPO (i.e., direct preference optimization) technique, this work proposes an additional training stage to improve the pre-trained personalized generation models. However, traditional DPO only determines the overall superiority or inferiority of two samples, which is not suitable for personalized image generation because the generated images are commonly inconsistent with the reference images only in some local image patches. To tackle this problem, this work proposes PatchDPO that estimates the quality of image patches within each generated image and accordingly trains the model. To this end, PatchDPO first leverages the pre-trained vision model with a proposed self-supervised training method to estimate the patch quality. Next, PatchDPO adopts a weighted training approach to train the model with the estimated patch quality, which rewards the image patches with high quality while penalizing the image patches with low quality. Experiment results demonstrate that PatchDPO significantly improves the performance of multiple pre-trained personalized generation models, and achieves state-of-the-art performance on both single-object and multi-object personalized image generation. Our code is available at https://github.com/hqhQAQ/PatchDPO.

Related papers

Policy Optimized Text-to-Image Pipeline Design [72.87655664038617]
We introduce a novel reinforcement learning-based framework for text-to-image generation.<n>Our approach first trains an ensemble of reward models capable of predicting image quality scores directly from prompt-workflow combinations.<n>We then implement a two-phase training strategy: initial vocabulary training followed by GRPO-based optimization.
arXiv Detail & Related papers (2025-05-27T17:50:47Z)
Single Image Iterative Subject-driven Generation and Editing [40.285860652338506]
We present SISO, a training-free approach to personalize image generation and editing from a single image without training. SISO iteratively generates images and optimize the model based on loss of similarity with the given subject image. We demonstrate significant improvements over existing methods in image quality, subject fidelity, and background preservation.
arXiv Detail & Related papers (2025-03-20T10:45:04Z)
Next Patch Prediction for Autoregressive Visual Generation [58.73461205369825]
We extend the Next Token Prediction (NTP) paradigm to a novel Next Patch Prediction (NPP) paradigm. Our key idea is to group and aggregate image tokens into patch tokens with higher information density. We show that NPP could reduce the training cost to around 0.6 times while improving image generation quality by up to 1.0 FID score on the ImageNet 256x256 generation benchmark.
arXiv Detail & Related papers (2024-12-19T18:59:36Z)
JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation [49.997839600988875]
Existing personalization methods rely on finetuning a text-to-image foundation model on a user's custom dataset. We propose Joint-Image Diffusion (jedi), an effective technique for learning a finetuning-free personalization model. Our model achieves state-of-the-art generation quality, both quantitatively and qualitatively, significantly outperforming both the prior finetuning-based and finetuning-free personalization baselines.
arXiv Detail & Related papers (2024-07-08T17:59:02Z)
Boost Your Human Image Generation Model via Direct Preference Optimization [5.9726297901501475]
Human image generation is a key focus in image synthesis due to its broad applications, but even slight inaccuracies in anatomy, pose, or details can compromise realism. We explore Direct Preference Optimization (DPO), which trains models to generate preferred (winning) images while diverging from non-preferred (losing) ones. We propose an enhanced DPO approach that incorporates high-quality real images as winning images, encouraging outputs to resemble real images rather than generated ones.
arXiv Detail & Related papers (2024-05-30T16:18:05Z)
Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack [75.00066365801993]
Training text-to-image models with web scale image-text pairs enables the generation of a wide range of visual concepts from text. These pre-trained models often face challenges when it comes to generating highly aesthetic images. We propose quality-tuning to guide a pre-trained model to exclusively generate highly visually appealing images.
arXiv Detail & Related papers (2023-09-27T17:30:19Z)
Patch Gradient Descent: Training Neural Networks on Very Large Images [13.969180905165533]
We propose Patch Gradient Descent (PatchGD) to train existing CNN architectures on large-scale images. PatchGD is based on the hypothesis that instead of performing gradient-based updates on an entire image at once, it should be possible to achieve a good solution by performing model updates on only small parts of the image. Our evaluation shows that PatchGD is much more stable and efficient than the standard gradient-descent method in handling large images.
arXiv Detail & Related papers (2023-01-31T18:04:35Z)
FewGAN: Generating from the Joint Distribution of a Few Images [95.6635227371479]
We introduce FewGAN, a generative model for generating novel, high-quality and diverse images. FewGAN is a hierarchical patch-GAN that applies quantization at the first coarse scale, followed by a pyramid of residual fully convolutional GANs at finer scales. In an extensive set of experiments, it is shown that FewGAN outperforms baselines both quantitatively and qualitatively.
arXiv Detail & Related papers (2022-07-18T07:11:28Z)
A Hierarchical Transformation-Discriminating Generative Model for Few Shot Anomaly Detection [93.38607559281601]
We devise a hierarchical generative model that captures the multi-scale patch distribution of each training image. The anomaly score is obtained by aggregating the patch-based votes of the correct transformation across scales and image regions.
arXiv Detail & Related papers (2021-04-29T17:49:48Z)
Drop the GAN: In Defense of Patches Nearest Neighbors as Single Image Generative Models [17.823089978609843]
We show that all of these tasks can be performed without any training, within several seconds, in a unified, surprisingly simple framework. We start with an initial coarse guess, and then simply refine the details coarse-to-fine using patch-nearest-neighbor search. This allows generating random novel images better and much faster than GANs.
arXiv Detail & Related papers (2021-03-29T12:20:46Z)
Perceptual Image Restoration with High-Quality Priori and Degradation Learning [28.93489249639681]
We show that our model performs well in measuring the similarity between restored and degraded images. Our simultaneous restoration and enhancement framework generalizes well to real-world complicated degradation types.
arXiv Detail & Related papers (2021-03-04T13:19:50Z)
The Power of Triply Complementary Priors for Image Compressive Sensing [89.14144796591685]
We propose a joint low-rank deep (LRD) image model, which contains a pair of complementaryly trip priors. We then propose a novel hybrid plug-and-play framework based on the LRD model for image CS. To make the optimization tractable, a simple yet effective algorithm is proposed to solve the proposed H-based image CS problem.
arXiv Detail & Related papers (2020-05-16T08:17:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.