Related papers: Towards More Accurate Personalized Image Generation: Addressing Overfitting and Evaluation Bias

Towards More Accurate Personalized Image Generation: Addressing Overfitting and Evaluation Bias

URL: http://arxiv.org/abs/2503.06632v1
Date: Sun, 09 Mar 2025 14:14:02 GMT
Title: Towards More Accurate Personalized Image Generation: Addressing Overfitting and Evaluation Bias
Authors: Mingxiao Li, Tingyu Qu, Tinne Tuytelaars, Marie-Francine Moens,
Abstract summary: The aim of image personalization is to create images based on a user-provided subject.<n>Current methods face challenges in ensuring fidelity to the text prompt.<n>We introduce a novel training pipeline that incorporates an attractor to filter out distractions in training images.
Score: 52.590072198551944
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Personalized image generation via text prompts has great potential to improve daily life and professional work by facilitating the creation of customized visual content. The aim of image personalization is to create images based on a user-provided subject while maintaining both consistency of the subject and flexibility to accommodate various textual descriptions of that subject. However, current methods face challenges in ensuring fidelity to the text prompt while not overfitting to the training data. In this work, we introduce a novel training pipeline that incorporates an attractor to filter out distractions in training images, allowing the model to focus on learning an effective representation of the personalized subject. Moreover, current evaluation methods struggle due to the lack of a dedicated test set. The evaluation set-up typically relies on the training data of the personalization task to compute text-image and image-image similarity scores, which, while useful, tend to overestimate performance. Although human evaluations are commonly used as an alternative, they often suffer from bias and inconsistency. To address these issues, we curate a diverse and high-quality test set with well-designed prompts. With this new benchmark, automatic evaluation metrics can reliably assess model performance

Related papers

Flux Already Knows -- Activating Subject-Driven Image Generation without Training [25.496237241889048]
We propose a zero-shot framework for subject-driven image generation using a vanilla Flux model. We activate strong identity-preserving capabilities without any additional data, training, or inference-time fine-tuning.
arXiv Detail & Related papers (2025-04-12T20:41:53Z)
Evaluating Hallucination in Text-to-Image Diffusion Models with Scene-Graph based Question-Answering Agent [9.748808189341526]
An effective Text-to-Image (T2I) evaluation metric should accomplish the following: detect instances where the generated images do not align with the textual prompts.<n>We propose a method based on large language models (LLMs) for conducting question-answering with an extracted scene-graph and created a dataset with human-rated scores for generated images.
arXiv Detail & Related papers (2024-12-07T18:44:38Z)
TypeScore: A Text Fidelity Metric for Text-to-Image Generative Models [39.06617653124486]
We introduce a new evaluation framework called TypeScore to assess a model's ability to generate images with high-fidelity embedded text. Our proposed metric demonstrates greater resolution than CLIPScore to differentiate popular image generation models.
arXiv Detail & Related papers (2024-11-02T07:56:54Z)
FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction [66.98008357232428]
We propose FineMatch, a new aspect-based fine-grained text and image matching benchmark. FineMatch focuses on text and image mismatch detection and correction. We show that models trained on FineMatch demonstrate enhanced proficiency in detecting fine-grained text and image mismatches.
arXiv Detail & Related papers (2024-04-23T03:42:14Z)
Training-Free Consistent Text-to-Image Generation [80.4814768762066]
Text-to-image models can portray the same subject across diverse prompts. Existing approaches fine-tune the model to teach it new words that describe specific user-provided subjects. We present ConsiStory, a training-free approach that enables consistent subject generation by sharing the internal activations of the pretrained model.
arXiv Detail & Related papers (2024-02-05T18:42:34Z)
Stellar: Systematic Evaluation of Human-Centric Personalized Text-to-Image Methods [52.806258774051216]
We focus on text-to-image systems that input a single image of an individual and ground the generation process along with text describing the desired visual context. We introduce a standardized dataset (Stellar) that contains personalized prompts coupled with images of individuals that is an order of magnitude larger than existing relevant datasets and where rich semantic ground-truth annotations are readily available. We derive a simple yet efficient, personalized text-to-image baseline that does not require test-time fine-tuning for each subject and which sets quantitatively and in human trials a new SoTA.
arXiv Detail & Related papers (2023-12-11T04:47:39Z)
Learning Transferable Pedestrian Representation from Multimodal Information Supervision [174.5150760804929]
VAL-PAT is a novel framework that learns transferable representations to enhance various pedestrian analysis tasks with multimodal information. We first perform pre-training on LUPerson-TA dataset, where each image contains text and attribute annotations. We then transfer the learned representations to various downstream tasks, including person reID, person attribute recognition and text-based person search.
arXiv Detail & Related papers (2023-04-12T01:20:58Z)
HIVE: Harnessing Human Feedback for Instructional Visual Editing [127.29436858998064]
We present a novel framework to harness human feedback for instructional visual editing (HIVE) Specifically, we collect human feedback on the edited images and learn a reward function to capture the underlying user preferences. We then introduce scalable diffusion model fine-tuning methods that can incorporate human preferences based on the estimated reward.
arXiv Detail & Related papers (2023-03-16T19:47:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.