PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss
- URL: http://arxiv.org/abs/2602.02493v1
- Date: Mon, 02 Feb 2026 18:59:42 GMT
- Title: PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss
- Authors: Zehong Ma, Ruihan Xu, Shiliang Zhang,
- Abstract summary: We propose PixelGen, a simple pixel diffusion framework with perceptual supervision.<n>Instead of modeling the full image manifold, PixelGen introduces two complementary perceptual losses.<n>An LPIPS loss facilitates learning better local patterns, while a DINO-based perceptual loss strengthens global semantics.
- Score: 47.868429337792314
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Pixel diffusion generates images directly in pixel space in an end-to-end manner, avoiding the artifacts and bottlenecks introduced by VAEs in two-stage latent diffusion. However, it is challenging to optimize high-dimensional pixel manifolds that contain many perceptually irrelevant signals, leaving existing pixel diffusion methods lagging behind latent diffusion models. We propose PixelGen, a simple pixel diffusion framework with perceptual supervision. Instead of modeling the full image manifold, PixelGen introduces two complementary perceptual losses to guide diffusion model towards learning a more meaningful perceptual manifold. An LPIPS loss facilitates learning better local patterns, while a DINO-based perceptual loss strengthens global semantics. With perceptual supervision, PixelGen surpasses strong latent diffusion baselines. It achieves an FID of 5.11 on ImageNet-256 without classifier-free guidance using only 80 training epochs, and demonstrates favorable scaling performance on large-scale text-to-image generation with a GenEval score of 0.79. PixelGen requires no VAEs, no latent representations, and no auxiliary stages, providing a simpler yet more powerful generative paradigm. Codes are publicly available at https://github.com/Zehong-Ma/PixelGen.
Related papers
- Eliminating VAE for Fast and High-Resolution Generative Detail Restoration [19.313842956605356]
Diffusion models have attained remarkable breakthroughs in the real-world super-resolution (SR) task.<n>Recent works like GenDR adopt step distillation to minimize the step number to one.<n>GenDR-Pix can restore 4K image in only 1 second and 6GB.
arXiv Detail & Related papers (2026-02-11T08:23:30Z) - One-step Latent-free Image Generation with Pixel Mean Flows [22.294629970410508]
We propose "pixel MeanFlow" (pMF) to formulate the network output space and the loss space separately.<n>pMF achieves strong results for one-step latent-free generation on ImageNet at 256x256 resolution (2.22 FID) and 512x512 resolution (2.48 FID)
arXiv Detail & Related papers (2026-01-29T18:59:56Z) - PixelDiT: Pixel Diffusion Transformers for Image Generation [48.456815413366535]
PixelDiT is a single-stage, end-to-end model for Diffusion Transformers.<n>It eliminates the need for the autoencoder and learns the diffusion process directly in the pixel space.<n>It achieves 1.61 FID on ImageNet 256x256, surpassing existing pixel generative models by a large margin.
arXiv Detail & Related papers (2025-11-25T18:59:25Z) - DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation [93.6273078684831]
We propose a frequency-DeCoupled pixel diffusion framework to pursue a more efficient pixel diffusion paradigm.<n>With the intuition to decouple the generation of high and low frequency components, we leverage a lightweight pixel decoder to generate high-frequency details conditioned on semantic guidance.<n>Experiments show that DeCo achieves superior performance among pixel diffusion models, attaining FID of 1.62 (256x256) and 2.22 (512x512) on ImageNet.
arXiv Detail & Related papers (2025-11-24T17:59:06Z) - DiP: Taming Diffusion Models in Pixel Space [91.51011771517683]
Diffusion Transformer (DiT) backbone operates on large patches for efficient global structure construction.<n>Co-trained lightweight Patch Detailer Head leverages contextual features to restore fine-grained local details.
arXiv Detail & Related papers (2025-11-24T06:55:49Z) - One Small Step in Latent, One Giant Leap for Pixels: Fast Latent Upscale Adapter for Your Diffusion Models [45.92038137978053]
We present the Latent Upscaler Adapter (LUA), a lightweight module that performs super-resolution directly on the generator's latent code.<n>LUA integrates as a drop-in component, requiring no modifications to the base model or additional diffusion stages.<n>A shared Swin-style backbone with scale-specific pixel-shuffle heads supports 2x and 4x factors and remains compatible with image-space SR baselines.
arXiv Detail & Related papers (2025-11-13T18:54:18Z) - Semi-supervised Counting via Pixel-by-pixel Density Distribution
Modelling [135.66138766927716]
This paper focuses on semi-supervised crowd counting, where only a small portion of the training data are labeled.
We formulate the pixel-wise density value to regress as a probability distribution, instead of a single deterministic value.
Our method clearly outperforms the competitors by a large margin under various labeled ratio settings.
arXiv Detail & Related papers (2024-02-23T12:48:02Z) - SDM: Spatial Diffusion Model for Large Hole Image Inpainting [106.90795513361498]
We present a novel spatial diffusion model (SDM) that uses a few iterations to gradually deliver informative pixels to the entire image.
Also, thanks to the proposed decoupled probabilistic modeling and spatial diffusion scheme, our method achieves high-quality large-hole completion.
arXiv Detail & Related papers (2022-12-06T13:30:18Z) - PixelFolder: An Efficient Progressive Pixel Synthesis Network for Image
Generation [88.55256389703082]
Pixel is a promising research paradigm for image generation, which can well exploit pixel-wise prior knowledge for generation.
In this paper, we propose a progressive pixel synthesis network towards efficient image generation, as Pixel.
With much less expenditure, Pixel obtains new state-of-the-art (SOTA) performance on two benchmark datasets.
arXiv Detail & Related papers (2022-04-02T10:55:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.