Integrating Reinforcement Learning with Visual Generative Models: Foundations and Advances
- URL: http://arxiv.org/abs/2508.10316v2
- Date: Mon, 27 Oct 2025 10:55:25 GMT
- Title: Integrating Reinforcement Learning with Visual Generative Models: Foundations and Advances
- Authors: Yuanzhi Liang, Yijie Fang, Rui Li, Ziqi Ni, Ruijie Su, Chi Zhang,
- Abstract summary: Reinforcement learning offers a principled framework for optimizing non-differentiable, preference-driven, and temporally structured objectives.<n>Recent advances demonstrate its effectiveness in enhancing controllability, consistency, and human alignment across generative tasks.
- Score: 8.56304683490938
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative models have made significant progress in synthesizing visual content, including images, videos, and 3D/4D structures. However, they are typically trained with surrogate objectives such as likelihood or reconstruction loss, which often misalign with perceptual quality, semantic accuracy, or physical realism. Reinforcement learning (RL) offers a principled framework for optimizing non-differentiable, preference-driven, and temporally structured objectives. Recent advances demonstrate its effectiveness in enhancing controllability, consistency, and human alignment across generative tasks. This survey provides a systematic overview of RL-based methods for visual content generation. We review the evolution of RL from classical control to its role as a general-purpose optimization tool, and examine its integration into image, video, and 3D/4D generation. Across these domains, RL serves not only as a fine-tuning mechanism but also as a structural component for aligning generation with complex, high-level goals. We conclude with open challenges and future research directions at the intersection of RL and generative modeling.
Related papers
- Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation [110.03631978640298]
We present the first systematic investigation into the mechanisms of applying the next-token prediction paradigm to the visual domain.<n>We identify three key properties that hinder the learning of high-level visual semantics.<n>We show that these issues can be effectively addressed by introducing self-supervised objectives during training.
arXiv Detail & Related papers (2025-09-18T17:47:40Z) - Reinforcement Learning in Vision: A Survey [36.820183535103695]
This survey offers a critical and up-to-date synthesis of the field.<n>We first formalize visual RL problems and trace the evolution of policy-optimization strategies.<n>We distill trends such as curriculum-driven training, preference-aligned diffusion, and unified reward modeling.
arXiv Detail & Related papers (2025-08-11T17:08:55Z) - AR-GRPO: Training Autoregressive Image Generation Models via Reinforcement Learning [56.71089466532673]
We propose AR-GRPO, an approach to integrate online RL training into autoregressive (AR) image generation models.<n>We conduct comprehensive experiments on both class-conditional (i.e., class-to-image) and text-conditional (i.e., text-to-image) image generation tasks.<n>Our results show consistent improvements across various evaluation metrics.
arXiv Detail & Related papers (2025-08-09T10:37:26Z) - Continual Learning for Generative AI: From LLMs to MLLMs and Beyond [56.29231194002407]
We present a comprehensive survey of continual learning methods for mainstream generative AI models.<n>We categorize these approaches into three paradigms: architecture-based, regularization-based, and replay-based.<n>We analyze continual learning setups for different generative models, including training objectives, benchmarks, and core backbones.
arXiv Detail & Related papers (2025-06-16T02:27:25Z) - Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO [68.44918104224818]
Autoregressive image generation presents unique challenges distinct from Chain-of-Thought (CoT) reasoning.<n>This study provides the first comprehensive investigation of the GRPO and DPO algorithms in autoregressive image generation.<n>Our findings reveal that GRPO and DPO exhibit distinct advantages, and crucially, that reward models possessing stronger intrinsic generalization capabilities potentially enhance the generalization potential of the applied RL algorithms.
arXiv Detail & Related papers (2025-05-22T17:59:49Z) - DanceGRPO: Unleashing GRPO on Visual Generation [36.36813831536346]
Reinforcement Learning (RL) has emerged as a promising approach for fine-tuning generative models.<n>Existing methods like DDPO and DPOK face fundamental limitations when scaling to large and diverse prompt sets.<n>This paper presents DanceGRPO, a framework that addresses these limitations through an innovative adaptation of Group Relative Policy Optimization.
arXiv Detail & Related papers (2025-05-12T17:59:34Z) - RealRAG: Retrieval-augmented Realistic Image Generation via Self-reflective Contrastive Learning [54.07026389388881]
We present the first real-object-based retrieval-augmented generation framework (RealRAG)<n>RealRAG augments fine-grained and unseen novel object generation by learning and retrieving real-world images to overcome the knowledge gaps of generative models.<n>Our framework integrates fine-grained visual knowledge for the generative models, tackling the distortion problem and improving the realism for fine-grained object generation.
arXiv Detail & Related papers (2025-02-02T16:41:54Z) - From Noise to Nuance: Advances in Deep Generative Image Models [8.802499769896192]
Deep learning-based image generation has undergone a paradigm shift since 2021.<n>Recent developments in Stable Diffusion, DALL-E, and consistency models have redefined the capabilities and performance boundaries of image synthesis.<n>We investigate how enhanced multi-modal understanding and zero-shot generation capabilities are reshaping practical applications across industries.
arXiv Detail & Related papers (2024-12-12T02:09:04Z) - INFOrmation Prioritization through EmPOWERment in Visual Model-Based RL [90.06845886194235]
We propose a modified objective for model-based reinforcement learning (RL)
We integrate a term inspired by variational empowerment into a state-space model based on mutual information.
We evaluate the approach on a suite of vision-based robot control tasks with natural video backgrounds.
arXiv Detail & Related papers (2022-04-18T23:09:23Z) - Optimization-Inspired Learning with Architecture Augmentations and
Control Mechanisms for Low-Level Vision [74.9260745577362]
This paper proposes a unified optimization-inspired learning framework to aggregate Generative, Discriminative, and Corrective (GDC) principles.
We construct three propagative modules to effectively solve the optimization models with flexible combinations.
Experiments across varied low-level vision tasks validate the efficacy and adaptability of GDC.
arXiv Detail & Related papers (2020-12-10T03:24:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.