Related papers: PSTF-AttControl: Per-Subject-Tuning-Free Personalized Image Generation with Controllable Face Attributes

PSTF-AttControl: Per-Subject-Tuning-Free Personalized Image Generation with Controllable Face Attributes

URL: http://arxiv.org/abs/2510.25084v1
Date: Wed, 29 Oct 2025 01:42:23 GMT
Title: PSTF-AttControl: Per-Subject-Tuning-Free Personalized Image Generation with Controllable Face Attributes
Authors: Xiang liu, Zhaoxiang Liu, Huan Hu, Zipeng Wang, Ping Chen, Zezhou Chen, Kai Wang, Shiguo Lian,
Abstract summary: Per-subject-tuning-free (PSTF) methods provide fine-grained control over facial attributes.<n>PSTF approaches lack precise control over facial attributes and high-fidelity preservation of facial identity.<n>We introduce a novel PSTF method that enables both precise control over facial attributes and high-fidelity preservation of facial identity.
Score: 14.665474831032979
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advancements in personalized image generation have significantly improved facial identity preservation, particularly in fields such as entertainment and social media. However, existing methods still struggle to achieve precise control over facial attributes in a per-subject-tuning-free (PSTF) way. Tuning-based techniques like PreciseControl have shown promise by providing fine-grained control over facial features, but they often require extensive technical expertise and additional training data, limiting their accessibility. In contrast, PSTF approaches simplify the process by enabling image generation from a single facial input, but they lack precise control over facial attributes. In this paper, we introduce a novel, PSTF method that enables both precise control over facial attributes and high-fidelity preservation of facial identity. Our approach utilizes a face recognition model to extract facial identity features, which are then mapped into the $W^+$ latent space of StyleGAN2 using the e4e encoder. We further enhance the model with a Triplet-Decoupled Cross-Attention module, which integrates facial identity, attribute features, and text embeddings into the UNet architecture, ensuring clean separation of identity and attribute information. Trained on the FFHQ dataset, our method allows for the generation of personalized images with fine-grained control over facial attributes, while without requiring additional fine-tuning or training data for individual identities. We demonstrate that our approach successfully balances personalization with precise facial attribute control, offering a more efficient and user-friendly solution for high-quality, adaptable facial image synthesis. The code is publicly available at https://github.com/UnicomAI/PSTF-AttControl.

Related papers

Reverse Personalization [48.09783075634403]
We analyze the identity generation process and introduce a reverse personalization framework for face anonymization.<n>Unlike prior anonymization methods, which lack control over facial attributes, our framework supports attribute-controllable anonymization.
arXiv Detail & Related papers (2025-12-28T16:06:55Z)
From Large Angles to Consistent Faces: Identity-Preserving Video Generation via Mixture of Facial Experts [69.44297222099175]
We introduce a Mixture of Facial Experts (MoFE) that captures distinct but mutually reinforcing aspects of facial attributes.<n>To mitigate dataset limitations, we have tailored a data processing pipeline centered on two key aspects: Face Constraints and Identity Consistency.<n>We have curated and refined a Large Face Angles (LFA) dataset from existing open-source human video datasets.
arXiv Detail & Related papers (2025-08-13T04:10:16Z)
Towards Consistent and Controllable Image Synthesis for Face Editing [18.646961062736207]
RigFace is a novel approach to control the lighting, facial expression and head pose of a portrait photo.<n>Our model achieves comparable or even superior performance in both identity preservation and photorealism compared to existing face editing models.
arXiv Detail & Related papers (2025-02-04T16:36:07Z)
ControlFace: Harnessing Facial Parametric Control for Face Rigging [31.765503860508378]
We introduce ControlFace, a novel face rigging method conditioned on 3DMM renderings that enables flexible, high-fidelity control.<n>We employ a dual-branch U-Nets: one, referred to as FaceNet, captures identity and fine details, while the other focuses on generation.<n>By training on a facial video dataset, we fully utilize FaceNet's rich representations while ensuring control adherence.
arXiv Detail & Related papers (2024-12-02T06:00:27Z)
G2Face: High-Fidelity Reversible Face Anonymization via Generative and Geometric Priors [71.69161292330504]
Reversible face anonymization seeks to replace sensitive identity information in facial images with synthesized alternatives. This paper introduces Gtextsuperscript2Face, which leverages both generative and geometric priors to enhance identity manipulation. Our method outperforms existing state-of-the-art techniques in face anonymization and recovery, while preserving high data utility.
arXiv Detail & Related papers (2024-08-18T12:36:47Z)
StableIdentity: Inserting Anybody into Anywhere at First Sight [57.99693188913382]
We propose StableIdentity, which allows identity-consistent recontextualization with just one face image. We are the first to directly inject the identity learned from a single image into video/3D generation without finetuning.
arXiv Detail & Related papers (2024-01-29T09:06:15Z)
DisControlFace: Adding Disentangled Control to Diffusion Autoencoder for One-shot Explicit Facial Image Editing [14.537856326925178]
We focus on exploring explicit fine-grained control of generative facial image editing. We propose a novel diffusion-based editing framework, named DisControlFace. Our model can be trained using 2D in-the-wild portrait images without requiring 3D or video data.
arXiv Detail & Related papers (2023-12-11T08:16:55Z)
When StyleGAN Meets Stable Diffusion: a $\mathscr{W}_+$ Adapter for Personalized Image Generation [60.305112612629465]
Text-to-image diffusion models have excelled in producing diverse, high-quality, and photo-realistic images. We present a novel use of the extended StyleGAN embedding space $mathcalW_+$ to achieve enhanced identity preservation and disentanglement for diffusion models. Our method adeptly generates personalized text-to-image outputs that are not only compatible with prompt descriptions but also amenable to common StyleGAN editing directions.
arXiv Detail & Related papers (2023-11-29T09:05:14Z)
Attribute-preserving Face Dataset Anonymization via Latent Code Optimization [64.4569739006591]
We present a task-agnostic anonymization procedure that directly optimize the images' latent representation in the latent space of a pre-trained GAN. We demonstrate through a series of experiments that our method is capable of anonymizing the identity of the images whilst -- crucially -- better-preserving the facial attributes.
arXiv Detail & Related papers (2023-03-20T17:34:05Z)
FacialGAN: Style Transfer and Attribute Manipulation on Synthetic Faces [9.664892091493586]
FacialGAN is a novel framework enabling simultaneous rich style transfers and interactive facial attributes manipulation. We show our model's capacity in producing visually compelling results in style transfer, attribute manipulation, diversity and face verification.
arXiv Detail & Related papers (2021-10-18T15:53:38Z)
FaceController: Controllable Attribute Editing for Face in the Wild [74.56117807309576]
We propose a simple feed-forward network to generate high-fidelity manipulated faces. By simply employing some existing and easy-obtainable prior information, our method can control, transfer, and edit diverse attributes of faces in the wild. In our method, we decouple identity, expression, pose, and illumination using 3D priors; separate texture and colors by using region-wise style codes.
arXiv Detail & Related papers (2021-02-23T02:47:28Z)
S2FGAN: Semantically Aware Interactive Sketch-to-Face Translation [11.724779328025589]
This paper proposes a sketch-to-image generation framework called S2FGAN. We employ two latent spaces to control the face appearance and adjust the desired attributes of the generated face. Our method successfully outperforms state-of-the-art methods on attribute manipulation by exploiting greater control of attribute intensity.
arXiv Detail & Related papers (2020-11-30T13:42:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.