Related papers: RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards

RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards

URL: http://arxiv.org/abs/2512.00473v1
Date: Sat, 29 Nov 2025 12:52:26 GMT
Title: RealGen: Photorealistic Text-to-Image Generation via Detector-Guided Rewards
Authors: Junyan Ye, Leiqi Zhu, Yuncheng Guo, Dongzhi Jiang, Zilong Huang, Yifan Zhang, Zhiyuan Yan, Haohuan Fu, Conghui He, Weijia Li,
Abstract summary: We propose RealGen, a text-to-image framework for photorealistic image generation.<n>Inspired by adversarial generation, RealGen introduces a "Detector Reward" mechanism, which quantifies artifacts and assesses realism.<n>Experiments demonstrate that RealGen significantly outperforms general models like GPT-Image-1 and Qwen-Image, as well as specialized photorealistic models like FLUX-Krea.
Score: 53.25632969696776
License: http://creativecommons.org/licenses/by/4.0/
Abstract: With the continuous advancement of image generation technology, advanced models such as GPT-Image-1 and Qwen-Image have achieved remarkable text-to-image consistency and world knowledge However, these models still fall short in photorealistic image generation. Even on simple T2I tasks, they tend to produce " fake" images with distinct AI artifacts, often characterized by "overly smooth skin" and "oily facial sheens". To recapture the original goal of "indistinguishable-from-reality" generation, we propose RealGen, a photorealistic text-to-image framework. RealGen integrates an LLM component for prompt optimization and a diffusion model for realistic image generation. Inspired by adversarial generation, RealGen introduces a "Detector Reward" mechanism, which quantifies artifacts and assesses realism using both semantic-level and feature-level synthetic image detectors. We leverage this reward signal with the GRPO algorithm to optimize the entire generation pipeline, significantly enhancing image realism and detail. Furthermore, we propose RealBench, an automated evaluation benchmark employing Detector-Scoring and Arena-Scoring. It enables human-free photorealism assessment, yielding results that are more accurate and aligned with real user experience. Experiments demonstrate that RealGen significantly outperforms general models like GPT-Image-1 and Qwen-Image, as well as specialized photorealistic models like FLUX-Krea, in terms of realism, detail, and aesthetics. The code is available at https://github.com/yejy53/RealGen.

Related papers

Detecting AI-Generated Images via Distributional Deviations from Real Images [6.615773227400183]
We propose a Masking-based Pre-trained model Fine-Tuning (MPFT) strategy, which introduces a Texture-Aware Masking (TAM) mechanism to mask textured areas containing generative model-specific patterns during fine-tuning.<n>Our method, fine-tuned with only a minimal number of images, significantly outperforms existing approaches, achieving up to 98.2% and 94.6% average accuracy on the two datasets, respectively.
arXiv Detail & Related papers (2026-01-07T05:00:13Z)
Computer vision training dataset generation for robotic environments using Gaussian splatting [0.0]
This paper introduces a novel pipeline for generating large-scale, highly realistic, and automatically labeled datasets for computer vision tasks in robotic environments.<n>We leverage 3D Gaussian Splatting (3DGS) to create photorealistic representations of the operational environment and objects.<n>A novel, two-pass rendering technique combines the realism of splats with a shadow map generated from proxy meshes.<n> Pixel-perfect segmentation masks are generated automatically and formatted for direct use with object detection models like YOLO.
arXiv Detail & Related papers (2025-12-15T15:00:17Z)
ImmerseGen: Agent-Guided Immersive World Generation with Alpha-Textured Proxies [25.96895266979283]
This paper presents WeImmerseGen, a novel agent-guided framework for compact and world-conditioned VR scenes.<n>We propose it bypasses complex textures with semanticcentric modeling.<n> Experiments demonstrate improved user efficiency and better VR rendering on mobile headsets.
arXiv Detail & Related papers (2025-06-17T08:50:05Z)
RealRAG: Retrieval-augmented Realistic Image Generation via Self-reflective Contrastive Learning [54.07026389388881]
We present the first real-object-based retrieval-augmented generation framework (RealRAG)<n>RealRAG augments fine-grained and unseen novel object generation by learning and retrieving real-world images to overcome the knowledge gaps of generative models.<n>Our framework integrates fine-grained visual knowledge for the generative models, tackling the distortion problem and improving the realism for fine-grained object generation.
arXiv Detail & Related papers (2025-02-02T16:41:54Z)
EnvGS: Modeling View-Dependent Appearance with Environment Gaussian [78.74634059559891]
EnvGS is a novel approach that employs a set of Gaussian primitives as an explicit 3D representation for capturing reflections of environments.<n>To efficiently render these environment Gaussian primitives, we developed a ray-tracing-based reflection that leverages the GPU's RT core for fast rendering.<n>Results from multiple real-world and synthetic datasets demonstrate that our method produces significantly more detailed reflections.
arXiv Detail & Related papers (2024-12-19T18:59:57Z)
Zero-Shot Detection of AI-Generated Images [54.01282123570917]
We propose a zero-shot entropy-based detector (ZED) to detect AI-generated images. Inspired by recent works on machine-generated text detection, our idea is to measure how surprising the image under analysis is compared to a model of real images. ZED achieves an average improvement of more than 3% over the SoTA in terms of accuracy.
arXiv Detail & Related papers (2024-09-24T08:46:13Z)
Solutions to Deepfakes: Can Camera Hardware, Cryptography, and Deep Learning Verify Real Images? [51.3344199560726]
It is imperative to establish methods that can separate real data from synthetic data with high confidence. This document aims to: present known strategies in detection and cryptography that can be employed to verify which images are real.
arXiv Detail & Related papers (2024-07-04T22:01:21Z)
PatchCraft: Exploring Texture Patch for Efficient AI-generated Image Detection [39.820699370876916]
We propose a novel AI-generated image detector capable of identifying fake images created by a wide range of generative models. A novel Smash&Reconstruction preprocessing is proposed to erase the global semantic information and enhance texture patches. Our approach outperforms state-of-the-art baselines by a significant margin.
arXiv Detail & Related papers (2023-11-21T07:12:40Z)
Deep CG2Real: Synthetic-to-Real Translation via Image Disentanglement [78.58603635621591]
Training an unpaired synthetic-to-real translation network in image space is severely under-constrained. We propose a semi-supervised approach that operates on the disentangled shading and albedo layers of the image. Our two-stage pipeline first learns to predict accurate shading in a supervised fashion using physically-based renderings as targets.
arXiv Detail & Related papers (2020-03-27T21:45:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.