Related papers: From Competition to Synergy: Unlocking Reinforcement Learning for Subject-Driven Image Generation

From Competition to Synergy: Unlocking Reinforcement Learning for Subject-Driven Image Generation

URL: http://arxiv.org/abs/2510.18263v1
Date: Tue, 21 Oct 2025 03:32:26 GMT
Title: From Competition to Synergy: Unlocking Reinforcement Learning for Subject-Driven Image Generation
Authors: Ziwei Huang, Ying Shu, Hao Fang, Quanyu Long, Wenya Wang, Qiushi Guo, Tiezheng Ge, Leilei Gan,
Abstract summary: A subject-driven image generation model faces a fundamental trade-off between identity preservation (fidelity) and prompt adherence (editability)<n>We propose a novel framework featuring two key innovations: Synergy-Aware Reward Shaping and Time-Aware Dynamic Weighting.<n>Our model achieves a superior balance, generating images that both preserve key identity features and accurately adhere to complex textual prompts.
Score: 37.43722287763904
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Subject-driven image generation models face a fundamental trade-off between identity preservation (fidelity) and prompt adherence (editability). While online reinforcement learning (RL), specifically GPRO, offers a promising solution, we find that a naive application of GRPO leads to competitive degradation, as the simple linear aggregation of rewards with static weights causes conflicting gradient signals and a misalignment with the temporal dynamics of the diffusion process. To overcome these limitations, we propose Customized-GRPO, a novel framework featuring two key innovations: (i) Synergy-Aware Reward Shaping (SARS), a non-linear mechanism that explicitly penalizes conflicted reward signals and amplifies synergistic ones, providing a sharper and more decisive gradient. (ii) Time-Aware Dynamic Weighting (TDW), which aligns the optimization pressure with the model's temporal dynamics by prioritizing prompt-following in the early, identity preservation in the later. Extensive experiments demonstrate that our method significantly outperforms naive GRPO baselines, successfully mitigating competitive degradation. Our model achieves a superior balance, generating images that both preserve key identity features and accurately adhere to complex textual prompts.

Related papers

VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation [31.201343197395573]
Visual generation is dominated by three paradigms: AutoRegressive (AR), diffusion, and Visual AutoRegressive ( VAR) models.<n>Unlike AR and diffusion, VARs operate on heterogeneous input structures across their generation steps, which creates severe asynchronous policy conflicts.<n>We propose a novel framework to enhance Group Relative Policy Optimization ( GRPO) by explicitly managing these conflicts.
arXiv Detail & Related papers (2026-01-05T16:36:40Z)
DiverseGRPO: Mitigating Mode Collapse in Image Generation via Diversity-Aware GRPO [50.89703227426486]
Reinforcement learning (RL) improves image generation quality significantly by comparing the relative performance of images generated within the same group.<n>In the later stages of training, the model tends to produce homogenized outputs, lacking creativity and visual diversity.<n>This issue can be analyzed from both reward modeling and generation dynamics perspectives.
arXiv Detail & Related papers (2025-12-25T05:37:37Z)
PCPO: Proportionate Credit Policy Optimization for Aligning Image Generation Models [54.18605375476406]
We introduce Proportionate Credit Policy Optimization (PCPO), a framework that enforces proportional credit assignment through a stable objective reformulation and a principled reweighting of timesteps.<n>PCPO substantially outperforms existing policy gradient baselines on all fronts, including the state-of-the-art DanceGRPO.
arXiv Detail & Related papers (2025-09-30T04:43:58Z)
STAGE: Stable and Generalizable GRPO for Autoregressive Image Generation [16.40446848402754]
Reinforcement learning has recently been explored to improve text-to-image generation.<n>Applying existing GRPO algorithms to autoregressive (AR) image models remains challenging.<n>In this work, we revisit GRPO for AR image generation and identify two key issues: contradictory gradients from unnecessary tokens and unstable policy entropy dynamics.
arXiv Detail & Related papers (2025-09-29T16:50:21Z)
Sycophancy Mitigation Through Reinforcement Learning with Uncertainty-Aware Adaptive Reasoning Trajectories [58.988535279557546]
We introduce textbf sycophancy Mitigation through Adaptive Reasoning Trajectories.<n>We show that SMART significantly reduces sycophantic behavior while preserving strong performance on out-of-distribution inputs.
arXiv Detail & Related papers (2025-09-20T17:09:14Z)
Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation [110.03631978640298]
We present the first systematic investigation into the mechanisms of applying the next-token prediction paradigm to the visual domain.<n>We identify three key properties that hinder the learning of high-level visual semantics.<n>We show that these issues can be effectively addressed by introducing self-supervised objectives during training.
arXiv Detail & Related papers (2025-09-18T17:47:40Z)
Enhancing Variational Autoencoders with Smooth Robust Latent Encoding [54.74721202894622]
Variational Autoencoders (VAEs) have played a key role in scaling up diffusion-based generative models.<n>We introduce Smooth Robust Latent VAE, a novel adversarial training framework that boosts both generation quality and robustness.<n>Experiments show that SRL-VAE improves both generation quality, in image reconstruction and text-guided image editing, and robustness, against Nightshade attacks and image editing attacks.
arXiv Detail & Related papers (2025-04-24T03:17:57Z)
ROCM: RLHF on consistency models [8.905375742101707]
We propose a reward optimization framework for applying RLHF to consistency models.<n>We investigate various $f$-divergences as regularization strategies, striking a balance between reward and model consistency.
arXiv Detail & Related papers (2025-03-08T11:19:48Z)
Parallelly Tempered Generative Adversarial Nets: Toward Stabilized Gradients [7.94957965474334]
A generative adversarial network (GAN) has been a representative backbone model in generative artificial intelligence (AI)<n>This work analyzes the training instability and inefficiency in the presence of mode collapse by linking it to multimodality in the target distribution.<n>With our newly developed GAN objective function, the generator can learn all the tempered distributions simultaneously.
arXiv Detail & Related papers (2024-11-18T18:01:13Z)
Auto-regressive Image Synthesis with Integrated Quantization [55.51231796778219]
This paper presents a versatile framework for conditional image generation. It incorporates the inductive bias of CNNs and powerful sequence modeling of auto-regression. Our method achieves superior diverse image generation performance as compared with the state-of-the-art.
arXiv Detail & Related papers (2022-07-21T22:19:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.