DisCo: Reinforcement with Diversity Constraints for Multi-Human Generation
- URL: http://arxiv.org/abs/2510.01399v1
- Date: Wed, 01 Oct 2025 19:28:51 GMT
- Title: DisCo: Reinforcement with Diversity Constraints for Multi-Human Generation
- Authors: Shubhankar Borse, Farzad Farhadzadeh, Munawar Hayat, Fatih Porikli,
- Abstract summary: DisCo is the first RL-based framework to directly optimize identity diversity in multi-human generation.<n>DisCo fine-tunes flow-matching models via Group-Relative Policy Optimization.<n>On the DiverseHumans Testset, DisCo achieves 98.6 Unique Face Accuracy and near-perfect Global Identity Spread.
- Score: 60.741022906593685
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: State-of-the-art text-to-image models excel at realism but collapse on multi-human prompts - duplicating faces, merging identities, and miscounting individuals. We introduce DisCo (Reinforcement with Diversity Constraints), the first RL-based framework to directly optimize identity diversity in multi-human generation. DisCo fine-tunes flow-matching models via Group-Relative Policy Optimization (GRPO) with a compositional reward that (i) penalizes intra-image facial similarity, (ii) discourages cross-sample identity repetition, (iii) enforces accurate person counts, and (iv) preserves visual fidelity through human preference scores. A single-stage curriculum stabilizes training as complexity scales, requiring no extra annotations. On the DiverseHumans Testset, DisCo achieves 98.6 Unique Face Accuracy and near-perfect Global Identity Spread - surpassing both open-source and proprietary methods (e.g., Gemini, GPT-Image) while maintaining competitive perceptual quality. Our results establish DisCo as a scalable, annotation-free solution that resolves the long-standing identity crisis in generative models and sets a new benchmark for compositional multi-human generation.
Related papers
- Optimizing ID Consistency in Multimodal Large Models: Facial Restoration via Alignment, Entanglement, and Disentanglement [54.199726425201895]
Multimodal editing large models have demonstrated powerful editing capabilities across diverse tasks.<n>Current facial ID preservation methods struggle to achieve consistent restoration of both facial identity and edited element IP.<n>We propose EditedID, an Alignment-Disentanglement-Entanglement framework for robust identity-specific facial restoration.
arXiv Detail & Related papers (2026-02-21T08:24:42Z) - Beyond the Dirac Delta: Mitigating Diversity Collapse in Reinforcement Fine-Tuning for Versatile Image Generation [51.305316234962554]
We propose textbfDRIFT (textbfDivetextbfRsity-textbfIncentivized Reinforcement textbfFine-textbfTuning for Versatile Image Generation), an innovative framework that systematically incentivizes output throughout the on-policy fine-tuning process.<n>DRIFT achieves superior dominance regarding task alignment and generation diversity, yielding a $ 9.08%!sim! 43.46%$ increase in diversity equivalent alignment levels and a $ 59.65
arXiv Detail & Related papers (2026-01-18T13:25:43Z) - SUGAR: A Sweeter Spot for Generative Unlearning of Many Identities [7.695475724838533]
Recent advances in 3D-aware generative models have enabled high-fidelity image synthesis of human identities.<n>We introduce SUGAR, a framework for scalable generative unlearning that enables the removal of many identities without retraining the entire model.
arXiv Detail & Related papers (2025-12-06T20:42:38Z) - WithAnyone: Towards Controllable and ID Consistent Image Generation [83.55786496542062]
Identity-consistent generation has become an important focus in text-to-image research.<n>We develop a large-scale paired dataset tailored for multi-person scenarios.<n>We propose a novel training paradigm with a contrastive identity loss that leverages paired data to balance fidelity with diversity.
arXiv Detail & Related papers (2025-10-16T17:59:54Z) - From Poses to Identity: Training-Free Person Re-Identification via Feature Centralization [9.614305363044737]
Person re-identification (ReID) aims to extract accurate identity representation features.<n>We propose a Training-Free Feature Centralization ReID framework (Pose2ID) to reduce individual noise.<n>Our method sets new state-of-the-art results across standard, cross-modality, and occluded ReID tasks.
arXiv Detail & Related papers (2025-03-02T15:31:48Z) - Foundation Cures Personalization: Improving Personalized Models' Prompt Consistency via Hidden Foundation Knowledge [33.35678923549471]
textbfFreeCure is a framework that improves the prompt consistency of personalization models.<n>We introduce a novel foundation-aware self-attention module, coupled with an inversion-based process to bring well-aligned attribute information to the personalization process.<n>FreeCure has demonstrated significant improvements in prompt consistency across a diverse set of state-of-the-art facial personalization models.
arXiv Detail & Related papers (2024-11-22T15:21:38Z) - ID$^3$: Identity-Preserving-yet-Diversified Diffusion Models for Synthetic Face Recognition [60.15830516741776]
Synthetic face recognition (SFR) aims to generate datasets that mimic the distribution of real face data.
We introduce a diffusion-fueled SFR model termed $textID3$.
$textID3$ employs an ID-preserving loss to generate diverse yet identity-consistent facial appearances.
arXiv Detail & Related papers (2024-09-26T06:46:40Z) - Unity in Diversity: Multi-expert Knowledge Confrontation and Collaboration for Generalizable Vehicle Re-identification [60.20318058777603]
Generalizable vehicle re-identification (ReID) seeks to develop models that can adapt to unknown target domains without the need for fine-tuning or retraining.<n>Previous works have mainly focused on extracting domain-invariant features by aligning data distributions between source domains.<n>We propose a two-stage Multi-expert Knowledge Confrontation and Collaboration (MiKeCoCo) method to solve this unique problem.
arXiv Detail & Related papers (2024-07-10T04:06:39Z) - DVG-Face: Dual Variational Generation for Heterogeneous Face Recognition [85.94331736287765]
We formulate HFR as a dual generation problem, and tackle it via a novel Dual Variational Generation (DVG-Face) framework.
We integrate abundant identity information of large-scale visible data into the joint distribution.
Massive new diverse paired heterogeneous images with the same identity can be generated from noises.
arXiv Detail & Related papers (2020-09-20T09:48:24Z) - Intra-Camera Supervised Person Re-Identification [87.88852321309433]
We propose a novel person re-identification paradigm based on an idea of independent per-camera identity annotation.
This eliminates the most time-consuming and tedious inter-camera identity labelling process.
We formulate a Multi-tAsk mulTi-labEl (MATE) deep learning method for Intra-Camera Supervised (ICS) person re-id.
arXiv Detail & Related papers (2020-02-12T15:26:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.