Related papers: ExpertGen: Training-Free Expert Guidance for Controllable Text-to-Face Generation

ExpertGen: Training-Free Expert Guidance for Controllable Text-to-Face Generation

URL: http://arxiv.org/abs/2505.17256v1
Date: Thu, 22 May 2025 20:09:21 GMT
Title: ExpertGen: Training-Free Expert Guidance for Controllable Text-to-Face Generation
Authors: Liang Shi, Yun Fu,
Abstract summary: ExpertGen is a training-free framework that leverages pre-trained expert models to guide generation with fine control.<n>We show qualitatively and quantitatively that expert models can guide the generation process with high precision.
Score: 49.294779074232686
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in diffusion models have significantly improved text-to-face generation, but achieving fine-grained control over facial features remains a challenge. Existing methods often require training additional modules to handle specific controls such as identity, attributes, or age, making them inflexible and resource-intensive. We propose ExpertGen, a training-free framework that leverages pre-trained expert models such as face recognition, facial attribute recognition, and age estimation networks to guide generation with fine control. Our approach uses a latent consistency model to ensure realistic and in-distribution predictions at each diffusion step, enabling accurate guidance signals to effectively steer the diffusion process. We show qualitatively and quantitatively that expert models can guide the generation process with high precision, and multiple experts can collaborate to enable simultaneous control over diverse facial aspects. By allowing direct integration of off-the-shelf expert models, our method transforms any such model into a plug-and-play component for controllable face generation.

Related papers

AttriCtrl: Fine-Grained Control of Aesthetic Attribute Intensity in Diffusion Models [32.46570968627392]
AttriCtrl is a plug-and-play framework for precise and continuous control of aesthetic attributes.<n>We quantify abstract aesthetics by leveraging semantic similarity from pre-trained vision-language models.<n>It is fully compatible with popular open-source controllable generation frameworks.
arXiv Detail & Related papers (2025-08-04T07:49:40Z)
A Practical Investigation of Spatially-Controlled Image Generation with Transformers [16.682348277650817]
We aim to provide clear takeaways across generation paradigms for practitioners wishing to develop systems for spatially-controlled generation.<n>We perform controlled experiments on ImageNet across diffusion-based/flow-based and autoregressive (AR) models.
arXiv Detail & Related papers (2025-07-21T15:33:49Z)
Bringing Diversity from Diffusion Models to Semantic-Guided Face Asset Generation [10.402456492958457]
This work aims to demonstrate that a semantically controllable generative network can provide enhanced control over the digital face modeling process.<n>We introduce a novel data generation pipeline that creates a high-quality 3D face database using a pre-trained diffusion model.<n>We introduce a comprehensive system designed for creating and editing high-quality face assets.
arXiv Detail & Related papers (2025-04-21T17:38:50Z)
SpectR: Dynamically Composing LM Experts with Spectral Routing [37.969478059005574]
This paper introduces SPECTR, an approach for dynamically composing expert models at each time step during inference.<n>We show that SPECTR improves routing accuracy over alternative training-free methods, increasing task performance across expert domains.
arXiv Detail & Related papers (2025-04-04T13:58:44Z)
A Simple Approach to Unifying Diffusion-based Conditional Generation [63.389616350290595]
We introduce a simple, unified framework to handle diverse conditional generation tasks.<n>Our approach enables versatile capabilities via different inference-time sampling schemes.<n>Our model supports additional capabilities like non-spatially aligned and coarse conditioning.
arXiv Detail & Related papers (2024-10-15T09:41:43Z)
CAR: Controllable Autoregressive Modeling for Visual Generation [100.33455832783416]
Controllable AutoRegressive Modeling (CAR) is a novel, plug-and-play framework that integrates conditional control into multi-scale latent variable modeling. CAR progressively refines and captures control representations, which are injected into each autoregressive step of the pre-trained model to guide the generation process. Our approach demonstrates excellent controllability across various types of conditions and delivers higher image quality compared to previous methods.
arXiv Detail & Related papers (2024-10-07T00:55:42Z)
Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control [59.954322727683746]
Face-Adapter is designed for high-precision and high-fidelity face editing for pre-trained diffusion models. Face-Adapter achieves comparable or even superior performance in terms of motion control precision, ID retention capability, and generation quality.
arXiv Detail & Related papers (2024-05-21T17:50:12Z)
TCIG: Two-Stage Controlled Image Generation with Quality Enhancement through Diffusion [0.0]
A two-stage method that combines controllability and high quality in the generation of images is proposed. By separating controllability from high quality, This method achieves outstanding results.
arXiv Detail & Related papers (2024-03-02T13:59:02Z)
Image is All You Need to Empower Large-scale Diffusion Models for In-Domain Generation [7.1629002695210024]
In-domain generation aims to perform a variety of tasks within a specific domain, such as unconditional generation, text-to-image, image editing, 3D generation, and more.<n>Early research typically required training specialized generators for each unique task and domain, often relying on fully-labeled data.<n>Motivated by the powerful generative capabilities and broad applications of diffusion models, we are driven to explore leveraging label-free data to empower these models for in-domain generation.
arXiv Detail & Related papers (2023-12-13T14:59:49Z)
Training and Tuning Generative Neural Radiance Fields for Attribute-Conditional 3D-Aware Face Generation [66.21121745446345]
We propose a conditional GNeRF model that integrates specific attribute labels as input, thus amplifying the controllability and disentanglement capabilities of 3D-aware generative models. Our approach builds upon a pre-trained 3D-aware face model, and we introduce a Training as Init and fidelity for Tuning (TRIOT) method to train a conditional normalized flow module. Our experiments substantiate the efficacy of our model, showcasing its ability to generate high-quality edits with enhanced view consistency.
arXiv Detail & Related papers (2022-08-26T10:05:39Z)
Unsupervised Controllable Generation with Self-Training [90.04287577605723]
controllable generation with GANs remains a challenging research problem. We propose an unsupervised framework to learn a distribution of latent codes that control the generator through self-training. Our framework exhibits better disentanglement compared to other variants such as the variational autoencoder.
arXiv Detail & Related papers (2020-07-17T21:50:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.