STAGE: Stable and Generalizable GRPO for Autoregressive Image Generation
- URL: http://arxiv.org/abs/2509.25027v1
- Date: Mon, 29 Sep 2025 16:50:21 GMT
- Title: STAGE: Stable and Generalizable GRPO for Autoregressive Image Generation
- Authors: Xiaoxiao Ma, Haibo Qiu, Guohui Zhang, Zhixiong Zeng, Siqi Yang, Lin Ma, Feng Zhao,
- Abstract summary: Reinforcement learning has recently been explored to improve text-to-image generation.<n>Applying existing GRPO algorithms to autoregressive (AR) image models remains challenging.<n>In this work, we revisit GRPO for AR image generation and identify two key issues: contradictory gradients from unnecessary tokens and unstable policy entropy dynamics.
- Score: 16.40446848402754
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement learning has recently been explored to improve text-to-image generation, yet applying existing GRPO algorithms to autoregressive (AR) image models remains challenging. The instability of the training process easily disrupts the pretrained model capability during long runs, resulting in marginal gains, degraded image quality, and poor generalization. In this work, we revisit GRPO for AR image generation and identify two key issues: contradictory gradients from unnecessary tokens and unstable policy entropy dynamics. To address these, we introduce STAGE, a stable and generalizable framework that leverages two targeted solutions: 1) Advantage/KL reweighting. Similarity-aware reweighting to alleviate conflicting updates; and 2) Entropy reward. An entropy-based reward corresponding to reference model to stabilize learning. With the help of alleviating conflicts between tokens and an entropy reward for stabilizing training, we reduce disruption of the pretrained distribution and mitigate reward hacking, which in turn improves generalization and transfer better to other benchmarks. Experiments across multiple benchmarks show that STAGE consistently improves visual quality, stability, and cross-task generalization compared to baseline GRPO.
Related papers
- AlignVAR: Towards Globally Consistent Visual Autoregression for Image Super-Resolution [16.90182090355781]
Visual autoregressive models offer stable training, non-iterative inference, and high-fidelity synthesis through next-scale prediction.<n>But their application remains underexplored and faces two critical challenges: locality-biased attention, and residual-only supervision.<n>We propose a globally consistent visual autoregressive framework tailored for image super-resolution.
arXiv Detail & Related papers (2026-02-28T10:39:06Z) - DiverseGRPO: Mitigating Mode Collapse in Image Generation via Diversity-Aware GRPO [50.89703227426486]
Reinforcement learning (RL) improves image generation quality significantly by comparing the relative performance of images generated within the same group.<n>In the later stages of training, the model tends to produce homogenized outputs, lacking creativity and visual diversity.<n>This issue can be analyzed from both reward modeling and generation dynamics perspectives.
arXiv Detail & Related papers (2025-12-25T05:37:37Z) - Training-Free Generation of Diverse and High-Fidelity Images via Prompt Semantic Space Optimization [50.5332987313297]
We propose Token-Prompt embedding Space Optimization (TPSO), a training-free and model-agnostic module.<n>TPSO introduces learnable parameters to explore underrepresented regions of the token embedding space, reducing the tendency of the model to repeatedly generate samples from strong modes of the learned distribution.<n>In experiments on MS-COCO and three diffusion backbones, TPSO significantly enhances generative diversity, improving baseline performance from 1.10 to 4.18 points, without sacrificing image quality.
arXiv Detail & Related papers (2025-11-25T00:42:09Z) - From Competition to Synergy: Unlocking Reinforcement Learning for Subject-Driven Image Generation [37.43722287763904]
A subject-driven image generation model faces a fundamental trade-off between identity preservation (fidelity) and prompt adherence (editability)<n>We propose a novel framework featuring two key innovations: Synergy-Aware Reward Shaping and Time-Aware Dynamic Weighting.<n>Our model achieves a superior balance, generating images that both preserve key identity features and accurately adhere to complex textual prompts.
arXiv Detail & Related papers (2025-10-21T03:32:26Z) - Towards Better & Faster Autoregressive Image Generation: From the Perspective of Entropy [23.573364375818553]
This work revisits the sampling issues in current autoregressive (AR) image generation models.<n>We identify that image tokens, unlike text tokens, exhibit lower information density and non-uniform spatial distribution.<n>We present an entropy-informed decoding strategy that facilitates higher autoregressive generation quality with faster synthesis speed.
arXiv Detail & Related papers (2025-10-10T05:26:11Z) - PCPO: Proportionate Credit Policy Optimization for Aligning Image Generation Models [54.18605375476406]
We introduce Proportionate Credit Policy Optimization (PCPO), a framework that enforces proportional credit assignment through a stable objective reformulation and a principled reweighting of timesteps.<n>PCPO substantially outperforms existing policy gradient baselines on all fronts, including the state-of-the-art DanceGRPO.
arXiv Detail & Related papers (2025-09-30T04:43:58Z) - Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation [110.03631978640298]
We present the first systematic investigation into the mechanisms of applying the next-token prediction paradigm to the visual domain.<n>We identify three key properties that hinder the learning of high-level visual semantics.<n>We show that these issues can be effectively addressed by introducing self-supervised objectives during training.
arXiv Detail & Related papers (2025-09-18T17:47:40Z) - Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning [34.75717081153747]
Current methods for scoring generated images are susceptible to reward hacking.<n>We propose Pref-GRPO, which shifts the optimization objective from score to preference fitting, ensuring more stable training.<n>Existing T2I benchmarks are limited by coarse evaluation criteria, hindering comprehensive model assessment.<n>We introduce UniGenBench, a unified T2I benchmark comprising 600 prompts across 5 main themes and 20 subthemes.
arXiv Detail & Related papers (2025-08-28T13:11:24Z) - AR-GRPO: Training Autoregressive Image Generation Models via Reinforcement Learning [56.71089466532673]
We propose AR-GRPO, an approach to integrate online RL training into autoregressive (AR) image generation models.<n>We conduct comprehensive experiments on both class-conditional (i.e., class-to-image) and text-conditional (i.e., text-to-image) image generation tasks.<n>Our results show consistent improvements across various evaluation metrics.
arXiv Detail & Related papers (2025-08-09T10:37:26Z) - AR-RAG: Autoregressive Retrieval Augmentation for Image Generation [35.008697736838194]
We introduce Autoregressive Retrieval Augmentation (AR-RAG), a novel paradigm that enhances image generation by autoregressively incorporating knearest neighbor retrievals at the patch level.<n>We validate the effectiveness of AR-RAG on widely adopted benchmarks, including Midjourney-30K, GenEval and DPG-Bench.
arXiv Detail & Related papers (2025-06-08T01:33:05Z) - Enhancing Variational Autoencoders with Smooth Robust Latent Encoding [54.74721202894622]
Variational Autoencoders (VAEs) have played a key role in scaling up diffusion-based generative models.<n>We introduce Smooth Robust Latent VAE, a novel adversarial training framework that boosts both generation quality and robustness.<n>Experiments show that SRL-VAE improves both generation quality, in image reconstruction and text-guided image editing, and robustness, against Nightshade attacks and image editing attacks.
arXiv Detail & Related papers (2025-04-24T03:17:57Z) - Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step [86.69947123512836]
Chain-of-Thought (CoT) reasoning has been extensively explored in large models to tackle complex understanding tasks.<n>We provide the first comprehensive investigation of the potential of CoT reasoning to enhance autoregressive image generation.<n>We propose the Potential Assessment Reward Model (PARM) and PARM++, specialized for autoregressive image generation.
arXiv Detail & Related papers (2025-01-23T18:59:43Z) - Robust Single Image Dehazing Based on Consistent and Contrast-Assisted
Reconstruction [95.5735805072852]
We propose a novel density-variational learning framework to improve the robustness of the image dehzing model.
Specifically, the dehazing network is optimized under the consistency-regularized framework.
Our method significantly surpasses the state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-29T08:11:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.