DiverseGRPO: Mitigating Mode Collapse in Image Generation via Diversity-Aware GRPO
- URL: http://arxiv.org/abs/2512.21514v1
- Date: Thu, 25 Dec 2025 05:37:37 GMT
- Title: DiverseGRPO: Mitigating Mode Collapse in Image Generation via Diversity-Aware GRPO
- Authors: Henglin Liu, Huijuan Huang, Jing Wang, Chang Liu, Xiu Li, Xiangyang Ji,
- Abstract summary: Reinforcement learning (RL) improves image generation quality significantly by comparing the relative performance of images generated within the same group.<n>In the later stages of training, the model tends to produce homogenized outputs, lacking creativity and visual diversity.<n>This issue can be analyzed from both reward modeling and generation dynamics perspectives.
- Score: 50.89703227426486
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL), particularly GRPO, improves image generation quality significantly by comparing the relative performance of images generated within the same group. However, in the later stages of training, the model tends to produce homogenized outputs, lacking creativity and visual diversity, which restricts its application scenarios. This issue can be analyzed from both reward modeling and generation dynamics perspectives. First, traditional GRPO relies on single-sample quality as the reward signal, driving the model to converge toward a few high-reward generation modes while neglecting distribution-level diversity. Second, conventional GRPO regularization neglects the dominant role of early-stage denoising in preserving diversity, causing a misaligned regularization budget that limits the achievable quality--diversity trade-off. Motivated by these insights, we revisit the diversity degradation problem from both reward modeling and generation dynamics. At the reward level, we propose a distributional creativity bonus based on semantic grouping. Specifically, we construct a distribution-level representation via spectral clustering over samples generated from the same caption, and adaptively allocate exploratory rewards according to group sizes to encourage the discovery of novel visual modes. At the generation level, we introduce a structure-aware regularization, which enforces stronger early-stage constraints to preserve diversity without compromising reward optimization efficiency. Experiments demonstrate that our method achieves a 13\%--18\% improvement in semantic diversity under matched quality scores, establishing a new Pareto frontier between image quality and diversity for GRPO-based image generation.
Related papers
- Beyond the Dirac Delta: Mitigating Diversity Collapse in Reinforcement Fine-Tuning for Versatile Image Generation [51.305316234962554]
We propose textbfDRIFT (textbfDivetextbfRsity-textbfIncentivized Reinforcement textbfFine-textbfTuning for Versatile Image Generation), an innovative framework that systematically incentivizes output throughout the on-policy fine-tuning process.<n>DRIFT achieves superior dominance regarding task alignment and generation diversity, yielding a $ 9.08%!sim! 43.46%$ increase in diversity equivalent alignment levels and a $ 59.65
arXiv Detail & Related papers (2026-01-18T13:25:43Z) - DiverseAR: Boosting Diversity in Bitwise Autoregressive Image Generation [22.400053095939402]
We introduce DiverseAR, a principled and effective method that enhances image diversity without sacrificing visual quality.<n>Specifically, we introduce an adaptive logits distribution scaling mechanism that dynamically adjusts the sharpness of the binary output distribution during sampling.<n>To mitigate potential fidelity loss caused by distribution smoothing, we develop an energy-based generation path search algorithm that avoids sampling low-confidence tokens.
arXiv Detail & Related papers (2025-12-02T16:54:36Z) - DiverseVAR: Balancing Diversity and Quality of Next-Scale Visual Autoregressive Models [23.12099227251494]
We introduce Diverse VAR, a framework that enhances the diversity of text-conditioned visual autoregressive models ( VAR) at test time.<n>Var models have emerged as strong competitors to diffusion and flow models for image generation.<n>Var models suffer from a critical limitation in diversity, often producing nearly identical images even for simple prompts.
arXiv Detail & Related papers (2025-11-26T14:06:52Z) - Training-Free Generation of Diverse and High-Fidelity Images via Prompt Semantic Space Optimization [50.5332987313297]
We propose Token-Prompt embedding Space Optimization (TPSO), a training-free and model-agnostic module.<n>TPSO introduces learnable parameters to explore underrepresented regions of the token embedding space, reducing the tendency of the model to repeatedly generate samples from strong modes of the learned distribution.<n>In experiments on MS-COCO and three diffusion backbones, TPSO significantly enhances generative diversity, improving baseline performance from 1.10 to 4.18 points, without sacrificing image quality.
arXiv Detail & Related papers (2025-11-25T00:42:09Z) - Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO [68.44918104224818]
Autoregressive image generation presents unique challenges distinct from Chain-of-Thought (CoT) reasoning.<n>This study provides the first comprehensive investigation of the GRPO and DPO algorithms in autoregressive image generation.<n>Our findings reveal that GRPO and DPO exhibit distinct advantages, and crucially, that reward models possessing stronger intrinsic generalization capabilities potentially enhance the generalization potential of the applied RL algorithms.
arXiv Detail & Related papers (2025-05-22T17:59:49Z) - Evaluating the Diversity and Quality of LLM Generated Content [72.84945252821908]
We introduce a framework for measuring effective semantic diversity--diversity among outputs that meet quality thresholds.<n>Although preference-tuned models exhibit reduced lexical and syntactic diversity, they produce greater effective semantic diversity than SFT or base models.<n>These findings have important implications for applications that require diverse yet high-quality outputs.
arXiv Detail & Related papers (2025-04-16T23:02:23Z) - Diversity-Rewarded CFG Distillation [62.08448835625036]
We introduce diversity-rewarded CFG distillation, a novel finetuning procedure that distills the strengths of CFG while addressing its limitations.
Our approach optimises two training objectives: (1) a distillation objective, encouraging the model alone (without CFG) to imitate the CFG-augmented predictions, and (2) an RL objective with a diversity reward, promoting the generation of diverse outputs for a given prompt.
arXiv Detail & Related papers (2024-10-08T14:40:51Z) - Elucidating Optimal Reward-Diversity Tradeoffs in Text-to-Image Diffusion Models [20.70550870149442]
We introduce Annealed Importance Guidance (AIG), an inference-time regularization inspired by Annealed Importance Sampling.
Our experiments demonstrate the benefits of AIG for Stable Diffusion models, striking the optimal balance between reward optimization and image diversity.
arXiv Detail & Related papers (2024-09-09T16:27:26Z) - Auto-regressive Image Synthesis with Integrated Quantization [55.51231796778219]
This paper presents a versatile framework for conditional image generation.
It incorporates the inductive bias of CNNs and powerful sequence modeling of auto-regression.
Our method achieves superior diverse image generation performance as compared with the state-of-the-art.
arXiv Detail & Related papers (2022-07-21T22:19:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.