From Competition to Synergy: Unlocking Reinforcement Learning for Subject-Driven Image Generation
- URL: http://arxiv.org/abs/2510.18263v1
- Date: Tue, 21 Oct 2025 03:32:26 GMT
- Title: From Competition to Synergy: Unlocking Reinforcement Learning for Subject-Driven Image Generation
- Authors: Ziwei Huang, Ying Shu, Hao Fang, Quanyu Long, Wenya Wang, Qiushi Guo, Tiezheng Ge, Leilei Gan,
- Abstract summary: A subject-driven image generation model faces a fundamental trade-off between identity preservation (fidelity) and prompt adherence (editability)<n>We propose a novel framework featuring two key innovations: Synergy-Aware Reward Shaping and Time-Aware Dynamic Weighting.<n>Our model achieves a superior balance, generating images that both preserve key identity features and accurately adhere to complex textual prompts.
- Score: 37.43722287763904
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Subject-driven image generation models face a fundamental trade-off between identity preservation (fidelity) and prompt adherence (editability). While online reinforcement learning (RL), specifically GPRO, offers a promising solution, we find that a naive application of GRPO leads to competitive degradation, as the simple linear aggregation of rewards with static weights causes conflicting gradient signals and a misalignment with the temporal dynamics of the diffusion process. To overcome these limitations, we propose Customized-GRPO, a novel framework featuring two key innovations: (i) Synergy-Aware Reward Shaping (SARS), a non-linear mechanism that explicitly penalizes conflicted reward signals and amplifies synergistic ones, providing a sharper and more decisive gradient. (ii) Time-Aware Dynamic Weighting (TDW), which aligns the optimization pressure with the model's temporal dynamics by prioritizing prompt-following in the early, identity preservation in the later. Extensive experiments demonstrate that our method significantly outperforms naive GRPO baselines, successfully mitigating competitive degradation. Our model achieves a superior balance, generating images that both preserve key identity features and accurately adhere to complex textual prompts.
Related papers
- VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation [31.201343197395573]
Visual generation is dominated by three paradigms: AutoRegressive (AR), diffusion, and Visual AutoRegressive ( VAR) models.<n>Unlike AR and diffusion, VARs operate on heterogeneous input structures across their generation steps, which creates severe asynchronous policy conflicts.<n>We propose a novel framework to enhance Group Relative Policy Optimization ( GRPO) by explicitly managing these conflicts.
arXiv Detail & Related papers (2026-01-05T16:36:40Z) - DiverseGRPO: Mitigating Mode Collapse in Image Generation via Diversity-Aware GRPO [50.89703227426486]
Reinforcement learning (RL) improves image generation quality significantly by comparing the relative performance of images generated within the same group.<n>In the later stages of training, the model tends to produce homogenized outputs, lacking creativity and visual diversity.<n>This issue can be analyzed from both reward modeling and generation dynamics perspectives.
arXiv Detail & Related papers (2025-12-25T05:37:37Z) - PCPO: Proportionate Credit Policy Optimization for Aligning Image Generation Models [54.18605375476406]
We introduce Proportionate Credit Policy Optimization (PCPO), a framework that enforces proportional credit assignment through a stable objective reformulation and a principled reweighting of timesteps.<n>PCPO substantially outperforms existing policy gradient baselines on all fronts, including the state-of-the-art DanceGRPO.
arXiv Detail & Related papers (2025-09-30T04:43:58Z) - STAGE: Stable and Generalizable GRPO for Autoregressive Image Generation [16.40446848402754]
Reinforcement learning has recently been explored to improve text-to-image generation.<n>Applying existing GRPO algorithms to autoregressive (AR) image models remains challenging.<n>In this work, we revisit GRPO for AR image generation and identify two key issues: contradictory gradients from unnecessary tokens and unstable policy entropy dynamics.
arXiv Detail & Related papers (2025-09-29T16:50:21Z) - Sycophancy Mitigation Through Reinforcement Learning with Uncertainty-Aware Adaptive Reasoning Trajectories [58.988535279557546]
We introduce textbf sycophancy Mitigation through Adaptive Reasoning Trajectories.<n>We show that SMART significantly reduces sycophantic behavior while preserving strong performance on out-of-distribution inputs.
arXiv Detail & Related papers (2025-09-20T17:09:14Z) - Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation [110.03631978640298]
We present the first systematic investigation into the mechanisms of applying the next-token prediction paradigm to the visual domain.<n>We identify three key properties that hinder the learning of high-level visual semantics.<n>We show that these issues can be effectively addressed by introducing self-supervised objectives during training.
arXiv Detail & Related papers (2025-09-18T17:47:40Z) - Enhancing Variational Autoencoders with Smooth Robust Latent Encoding [54.74721202894622]
Variational Autoencoders (VAEs) have played a key role in scaling up diffusion-based generative models.<n>We introduce Smooth Robust Latent VAE, a novel adversarial training framework that boosts both generation quality and robustness.<n>Experiments show that SRL-VAE improves both generation quality, in image reconstruction and text-guided image editing, and robustness, against Nightshade attacks and image editing attacks.
arXiv Detail & Related papers (2025-04-24T03:17:57Z) - ROCM: RLHF on consistency models [8.905375742101707]
We propose a reward optimization framework for applying RLHF to consistency models.<n>We investigate various $f$-divergences as regularization strategies, striking a balance between reward and model consistency.
arXiv Detail & Related papers (2025-03-08T11:19:48Z) - Parallelly Tempered Generative Adversarial Nets: Toward Stabilized Gradients [7.94957965474334]
A generative adversarial network (GAN) has been a representative backbone model in generative artificial intelligence (AI)<n>This work analyzes the training instability and inefficiency in the presence of mode collapse by linking it to multimodality in the target distribution.<n>With our newly developed GAN objective function, the generator can learn all the tempered distributions simultaneously.
arXiv Detail & Related papers (2024-11-18T18:01:13Z) - Auto-regressive Image Synthesis with Integrated Quantization [55.51231796778219]
This paper presents a versatile framework for conditional image generation.
It incorporates the inductive bias of CNNs and powerful sequence modeling of auto-regression.
Our method achieves superior diverse image generation performance as compared with the state-of-the-art.
arXiv Detail & Related papers (2022-07-21T22:19:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.