Related papers: Deciphering Personalization: Towards Fine-Grained Explainability in Natural Language for Personalized Image Generation Models

Deciphering Personalization: Towards Fine-Grained Explainability in Natural Language for Personalized Image Generation Models

URL: http://arxiv.org/abs/2511.01932v1
Date: Sun, 02 Nov 2025 16:08:24 GMT
Title: Deciphering Personalization: Towards Fine-Grained Explainability in Natural Language for Personalized Image Generation Models
Authors: Haoming Wang, Wei Gao,
Abstract summary: FineXL can improve the accuracy of explainability by 56%, when different personalization scenarios are applied to multiple types of image generation models.<n>This paper presents a new technique, namely textbfFineXL, towards textbfFine-grained etextbfXplainability in natural textbfLanguage for personalized image generation models.
Score: 9.722829662835233
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Image generation models are usually personalized in practical uses in order to better meet the individual users' heterogeneous needs, but most personalized models lack explainability about how they are being personalized. Such explainability can be provided via visual features in generated images, but is difficult for human users to understand. Explainability in natural language is a better choice, but the existing approaches to explainability in natural language are limited to be coarse-grained. They are unable to precisely identify the multiple aspects of personalization, as well as the varying levels of personalization in each aspect. To address such limitation, in this paper we present a new technique, namely \textbf{FineXL}, towards \textbf{Fine}-grained e\textbf{X}plainability in natural \textbf{L}anguage for personalized image generation models. FineXL can provide natural language descriptions about each distinct aspect of personalization, along with quantitative scores indicating the level of each aspect of personalization. Experiment results show that FineXL can improve the accuracy of explainability by 56\%, when different personalization scenarios are applied to multiple types of image generation models.

Related papers

Reverse Personalization [48.09783075634403]
We analyze the identity generation process and introduce a reverse personalization framework for face anonymization.<n>Unlike prior anonymization methods, which lack control over facial attributes, our framework supports attribute-controllable anonymization.
arXiv Detail & Related papers (2025-12-28T16:06:55Z)
Personalized Image Descriptions from Attention Sequences [55.65023709100682]
People can view the same image differently: they focus on different regions, objects, and details in varying orders and describe them in distinct linguistic styles.<n>Existing models for personalized image description focus on linguistic style alone, with no prior work leveraging individual viewing patterns.<n>We address this gap by explicitly modeling personalized viewing behavior as a core factor in description generation.<n>Our method, DEPER, learns a subject embedding that captures both linguistic style and viewing behavior, guided by an auxiliary attention-prediction task. A lightweight adapter aligns these embeddings with a frozen vision-language model, enabling few-shot personalization without retraining.
arXiv Detail & Related papers (2025-12-07T05:23:18Z)
Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning [40.06403155373455]
We propose a novel reinforcement learning framework for personalized text-to-image generation. Our proposed approach outperforms existing state-of-the-art methods by a large margin on visual fidelity while maintaining text-alignment.
arXiv Detail & Related papers (2024-07-09T08:11:53Z)
Training-Free Consistent Text-to-Image Generation [80.4814768762066]
Text-to-image models can portray the same subject across diverse prompts. Existing approaches fine-tune the model to teach it new words that describe specific user-provided subjects. We present ConsiStory, a training-free approach that enables consistent subject generation by sharing the internal activations of the pretrained model.
arXiv Detail & Related papers (2024-02-05T18:42:34Z)
Pick-and-Draw: Training-free Semantic Guidance for Text-to-Image Personalization [56.12990759116612]
Pick-and-Draw is a training-free semantic guidance approach to boost identity consistency and generative diversity for personalization methods. The proposed approach can be applied to any personalized diffusion models and requires as few as a single reference image.
arXiv Detail & Related papers (2024-01-30T05:56:12Z)
PALP: Prompt Aligned Personalization of Text-to-Image Models [68.91005384187348]
Existing personalization methods compromise personalization ability or the alignment to complex prompts. We propose a new approach focusing on personalization methods for a emphsingle prompt to address this issue. Our method excels in improving text alignment, enabling the creation of images with complex and intricate prompts.
arXiv Detail & Related papers (2024-01-11T18:35:33Z)
InstructBooth: Instruction-following Personalized Text-to-Image Generation [30.89054609185801]
InstructBooth is a novel method designed to enhance image-text alignment in personalized text-to-image models. Our approach first personalizes text-to-image models with a small number of subject-specific images using a unique identifier. After personalization, we fine-tune personalized text-to-image models using reinforcement learning to maximize a reward that quantifies image-text alignment.
arXiv Detail & Related papers (2023-12-04T20:34:46Z)
When StyleGAN Meets Stable Diffusion: a $\mathscr{W}_+$ Adapter for Personalized Image Generation [60.305112612629465]
Text-to-image diffusion models have excelled in producing diverse, high-quality, and photo-realistic images. We present a novel use of the extended StyleGAN embedding space $mathcalW_+$ to achieve enhanced identity preservation and disentanglement for diffusion models. Our method adeptly generates personalized text-to-image outputs that are not only compatible with prompt descriptions but also amenable to common StyleGAN editing directions.
arXiv Detail & Related papers (2023-11-29T09:05:14Z)
DreamIdentity: Improved Editability for Efficient Face-identity Preserved Image Generation [69.16517915592063]
We propose a novel face-identity encoder to learn an accurate representation of human faces. We also propose self-augmented editability learning to enhance the editability of models. Our methods can generate identity-preserved images under different scenes at a much faster speed.
arXiv Detail & Related papers (2023-07-01T11:01:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.