Related papers: PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss

PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss

URL: http://arxiv.org/abs/2602.02493v1
Date: Mon, 02 Feb 2026 18:59:42 GMT
Title: PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss
Authors: Zehong Ma, Ruihan Xu, Shiliang Zhang,
Abstract summary: We propose PixelGen, a simple pixel diffusion framework with perceptual supervision.<n>Instead of modeling the full image manifold, PixelGen introduces two complementary perceptual losses.<n>An LPIPS loss facilitates learning better local patterns, while a DINO-based perceptual loss strengthens global semantics.
Score: 47.868429337792314
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Pixel diffusion generates images directly in pixel space in an end-to-end manner, avoiding the artifacts and bottlenecks introduced by VAEs in two-stage latent diffusion. However, it is challenging to optimize high-dimensional pixel manifolds that contain many perceptually irrelevant signals, leaving existing pixel diffusion methods lagging behind latent diffusion models. We propose PixelGen, a simple pixel diffusion framework with perceptual supervision. Instead of modeling the full image manifold, PixelGen introduces two complementary perceptual losses to guide diffusion model towards learning a more meaningful perceptual manifold. An LPIPS loss facilitates learning better local patterns, while a DINO-based perceptual loss strengthens global semantics. With perceptual supervision, PixelGen surpasses strong latent diffusion baselines. It achieves an FID of 5.11 on ImageNet-256 without classifier-free guidance using only 80 training epochs, and demonstrates favorable scaling performance on large-scale text-to-image generation with a GenEval score of 0.79. PixelGen requires no VAEs, no latent representations, and no auxiliary stages, providing a simpler yet more powerful generative paradigm. Codes are publicly available at https://github.com/Zehong-Ma/PixelGen.

Related papers

Eliminating VAE for Fast and High-Resolution Generative Detail Restoration [19.313842956605356]
Diffusion models have attained remarkable breakthroughs in the real-world super-resolution (SR) task.<n>Recent works like GenDR adopt step distillation to minimize the step number to one.<n>GenDR-Pix can restore 4K image in only 1 second and 6GB.
arXiv Detail & Related papers (2026-02-11T08:23:30Z)
One-step Latent-free Image Generation with Pixel Mean Flows [22.294629970410508]
We propose "pixel MeanFlow" (pMF) to formulate the network output space and the loss space separately.<n>pMF achieves strong results for one-step latent-free generation on ImageNet at 256x256 resolution (2.22 FID) and 512x512 resolution (2.48 FID)
arXiv Detail & Related papers (2026-01-29T18:59:56Z)
PixelDiT: Pixel Diffusion Transformers for Image Generation [48.456815413366535]
PixelDiT is a single-stage, end-to-end model for Diffusion Transformers.<n>It eliminates the need for the autoencoder and learns the diffusion process directly in the pixel space.<n>It achieves 1.61 FID on ImageNet 256x256, surpassing existing pixel generative models by a large margin.
arXiv Detail & Related papers (2025-11-25T18:59:25Z)
DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation [93.6273078684831]
We propose a frequency-DeCoupled pixel diffusion framework to pursue a more efficient pixel diffusion paradigm.<n>With the intuition to decouple the generation of high and low frequency components, we leverage a lightweight pixel decoder to generate high-frequency details conditioned on semantic guidance.<n>Experiments show that DeCo achieves superior performance among pixel diffusion models, attaining FID of 1.62 (256x256) and 2.22 (512x512) on ImageNet.
arXiv Detail & Related papers (2025-11-24T17:59:06Z)
DiP: Taming Diffusion Models in Pixel Space [91.51011771517683]
Diffusion Transformer (DiT) backbone operates on large patches for efficient global structure construction.<n>Co-trained lightweight Patch Detailer Head leverages contextual features to restore fine-grained local details.
arXiv Detail & Related papers (2025-11-24T06:55:49Z)
One Small Step in Latent, One Giant Leap for Pixels: Fast Latent Upscale Adapter for Your Diffusion Models [45.92038137978053]
We present the Latent Upscaler Adapter (LUA), a lightweight module that performs super-resolution directly on the generator's latent code.<n>LUA integrates as a drop-in component, requiring no modifications to the base model or additional diffusion stages.<n>A shared Swin-style backbone with scale-specific pixel-shuffle heads supports 2x and 4x factors and remains compatible with image-space SR baselines.
arXiv Detail & Related papers (2025-11-13T18:54:18Z)
Semi-supervised Counting via Pixel-by-pixel Density Distribution Modelling [135.66138766927716]
This paper focuses on semi-supervised crowd counting, where only a small portion of the training data are labeled. We formulate the pixel-wise density value to regress as a probability distribution, instead of a single deterministic value. Our method clearly outperforms the competitors by a large margin under various labeled ratio settings.
arXiv Detail & Related papers (2024-02-23T12:48:02Z)
SDM: Spatial Diffusion Model for Large Hole Image Inpainting [106.90795513361498]
We present a novel spatial diffusion model (SDM) that uses a few iterations to gradually deliver informative pixels to the entire image. Also, thanks to the proposed decoupled probabilistic modeling and spatial diffusion scheme, our method achieves high-quality large-hole completion.
arXiv Detail & Related papers (2022-12-06T13:30:18Z)
PixelFolder: An Efficient Progressive Pixel Synthesis Network for Image Generation [88.55256389703082]
Pixel is a promising research paradigm for image generation, which can well exploit pixel-wise prior knowledge for generation. In this paper, we propose a progressive pixel synthesis network towards efficient image generation, as Pixel. With much less expenditure, Pixel obtains new state-of-the-art (SOTA) performance on two benchmark datasets.
arXiv Detail & Related papers (2022-04-02T10:55:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.