Related papers: DisCo: Reinforcement with Diversity Constraints for Multi-Human Generation

DisCo: Reinforcement with Diversity Constraints for Multi-Human Generation

URL: http://arxiv.org/abs/2510.01399v1
Date: Wed, 01 Oct 2025 19:28:51 GMT
Title: DisCo: Reinforcement with Diversity Constraints for Multi-Human Generation
Authors: Shubhankar Borse, Farzad Farhadzadeh, Munawar Hayat, Fatih Porikli,
Abstract summary: DisCo is the first RL-based framework to directly optimize identity diversity in multi-human generation.<n>DisCo fine-tunes flow-matching models via Group-Relative Policy Optimization.<n>On the DiverseHumans Testset, DisCo achieves 98.6 Unique Face Accuracy and near-perfect Global Identity Spread.
Score: 60.741022906593685
License: http://creativecommons.org/licenses/by/4.0/
Abstract: State-of-the-art text-to-image models excel at realism but collapse on multi-human prompts - duplicating faces, merging identities, and miscounting individuals. We introduce DisCo (Reinforcement with Diversity Constraints), the first RL-based framework to directly optimize identity diversity in multi-human generation. DisCo fine-tunes flow-matching models via Group-Relative Policy Optimization (GRPO) with a compositional reward that (i) penalizes intra-image facial similarity, (ii) discourages cross-sample identity repetition, (iii) enforces accurate person counts, and (iv) preserves visual fidelity through human preference scores. A single-stage curriculum stabilizes training as complexity scales, requiring no extra annotations. On the DiverseHumans Testset, DisCo achieves 98.6 Unique Face Accuracy and near-perfect Global Identity Spread - surpassing both open-source and proprietary methods (e.g., Gemini, GPT-Image) while maintaining competitive perceptual quality. Our results establish DisCo as a scalable, annotation-free solution that resolves the long-standing identity crisis in generative models and sets a new benchmark for compositional multi-human generation.

Related papers

Optimizing ID Consistency in Multimodal Large Models: Facial Restoration via Alignment, Entanglement, and Disentanglement [54.199726425201895]
Multimodal editing large models have demonstrated powerful editing capabilities across diverse tasks.<n>Current facial ID preservation methods struggle to achieve consistent restoration of both facial identity and edited element IP.<n>We propose EditedID, an Alignment-Disentanglement-Entanglement framework for robust identity-specific facial restoration.
arXiv Detail & Related papers (2026-02-21T08:24:42Z)
Beyond the Dirac Delta: Mitigating Diversity Collapse in Reinforcement Fine-Tuning for Versatile Image Generation [51.305316234962554]
We propose textbfDRIFT (textbfDivetextbfRsity-textbfIncentivized Reinforcement textbfFine-textbfTuning for Versatile Image Generation), an innovative framework that systematically incentivizes output throughout the on-policy fine-tuning process.<n>DRIFT achieves superior dominance regarding task alignment and generation diversity, yielding a $ 9.08%!sim! 43.46%$ increase in diversity equivalent alignment levels and a $ 59.65
arXiv Detail & Related papers (2026-01-18T13:25:43Z)
SUGAR: A Sweeter Spot for Generative Unlearning of Many Identities [7.695475724838533]
Recent advances in 3D-aware generative models have enabled high-fidelity image synthesis of human identities.<n>We introduce SUGAR, a framework for scalable generative unlearning that enables the removal of many identities without retraining the entire model.
arXiv Detail & Related papers (2025-12-06T20:42:38Z)
WithAnyone: Towards Controllable and ID Consistent Image Generation [83.55786496542062]
Identity-consistent generation has become an important focus in text-to-image research.<n>We develop a large-scale paired dataset tailored for multi-person scenarios.<n>We propose a novel training paradigm with a contrastive identity loss that leverages paired data to balance fidelity with diversity.
arXiv Detail & Related papers (2025-10-16T17:59:54Z)
From Poses to Identity: Training-Free Person Re-Identification via Feature Centralization [9.614305363044737]
Person re-identification (ReID) aims to extract accurate identity representation features.<n>We propose a Training-Free Feature Centralization ReID framework (Pose2ID) to reduce individual noise.<n>Our method sets new state-of-the-art results across standard, cross-modality, and occluded ReID tasks.
arXiv Detail & Related papers (2025-03-02T15:31:48Z)
Foundation Cures Personalization: Improving Personalized Models' Prompt Consistency via Hidden Foundation Knowledge [33.35678923549471]
textbfFreeCure is a framework that improves the prompt consistency of personalization models.<n>We introduce a novel foundation-aware self-attention module, coupled with an inversion-based process to bring well-aligned attribute information to the personalization process.<n>FreeCure has demonstrated significant improvements in prompt consistency across a diverse set of state-of-the-art facial personalization models.
arXiv Detail & Related papers (2024-11-22T15:21:38Z)
ID$^3$: Identity-Preserving-yet-Diversified Diffusion Models for Synthetic Face Recognition [60.15830516741776]
Synthetic face recognition (SFR) aims to generate datasets that mimic the distribution of real face data. We introduce a diffusion-fueled SFR model termed $textID3$. $textID3$ employs an ID-preserving loss to generate diverse yet identity-consistent facial appearances.
arXiv Detail & Related papers (2024-09-26T06:46:40Z)
Unity in Diversity: Multi-expert Knowledge Confrontation and Collaboration for Generalizable Vehicle Re-identification [60.20318058777603]
Generalizable vehicle re-identification (ReID) seeks to develop models that can adapt to unknown target domains without the need for fine-tuning or retraining.<n>Previous works have mainly focused on extracting domain-invariant features by aligning data distributions between source domains.<n>We propose a two-stage Multi-expert Knowledge Confrontation and Collaboration (MiKeCoCo) method to solve this unique problem.
arXiv Detail & Related papers (2024-07-10T04:06:39Z)
DVG-Face: Dual Variational Generation for Heterogeneous Face Recognition [85.94331736287765]
We formulate HFR as a dual generation problem, and tackle it via a novel Dual Variational Generation (DVG-Face) framework. We integrate abundant identity information of large-scale visible data into the joint distribution. Massive new diverse paired heterogeneous images with the same identity can be generated from noises.
arXiv Detail & Related papers (2020-09-20T09:48:24Z)
Intra-Camera Supervised Person Re-Identification [87.88852321309433]
We propose a novel person re-identification paradigm based on an idea of independent per-camera identity annotation. This eliminates the most time-consuming and tedious inter-camera identity labelling process. We formulate a Multi-tAsk mulTi-labEl (MATE) deep learning method for Intra-Camera Supervised (ICS) person re-id.
arXiv Detail & Related papers (2020-02-12T15:26:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.