Related papers: Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models

Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models

URL: http://arxiv.org/abs/2503.18352v2
Date: Fri, 28 Mar 2025 04:51:44 GMT
Title: Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models
Authors: Jinjin Zhang, Qiuyu Huang, Junjie Liu, Xiefan Guo, Di Huang,
Abstract summary: Diffusion-4K is a novel framework for direct ultra-high-resolution image synthesis using text-to-image diffusion models.<n>We construct Aesthetic-4K, a comprehensive benchmark for ultra-high-resolution image generation.<n>We propose a wavelet-based fine-tuning approach for direct training with 4K images, applicable to various latent diffusion models.
Score: 21.46605047406198
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we present Diffusion-4K, a novel framework for direct ultra-high-resolution image synthesis using text-to-image diffusion models. The core advancements include: (1) Aesthetic-4K Benchmark: addressing the absence of a publicly available 4K image synthesis dataset, we construct Aesthetic-4K, a comprehensive benchmark for ultra-high-resolution image generation. We curated a high-quality 4K dataset with carefully selected images and captions generated by GPT-4o. Additionally, we introduce GLCM Score and Compression Ratio metrics to evaluate fine details, combined with holistic measures such as FID, Aesthetics and CLIPScore for a comprehensive assessment of ultra-high-resolution images. (2) Wavelet-based Fine-tuning: we propose a wavelet-based fine-tuning approach for direct training with photorealistic 4K images, applicable to various latent diffusion models, demonstrating its effectiveness in synthesizing highly detailed 4K images. Consequently, Diffusion-4K achieves impressive performance in high-quality image synthesis and text prompt adherence, especially when powered by modern large-scale diffusion models (e.g., SD3-2B and Flux-12B). Extensive experimental results from our benchmark demonstrate the superiority of Diffusion-4K in ultra-high-resolution image synthesis.

Related papers

4KAgent: Agentic Any Image to 4K Super-Resolution [62.99433518118836]
We present 4KAgent, a super-resolution generalist system designed to upscale any image to 4K resolution (and even higher, if applied iteratively)<n>4KAgent comprises three core components: (1) Profiling, a module that customizes the 4KAgent pipeline based on bespoke use cases; (2) A Perception Agent, which leverages vision-language models alongside image quality assessment experts to analyze the input image and make a tailored restoration plan; and (3) A Restoration Agent, which executes the plan, following a quality-driven mixture-of-expert policy to select the optimal output for each step.<n>We rigorously evaluate our 4KAgent
arXiv Detail & Related papers (2025-07-09T17:59:19Z)
HiWave: Training-Free High-Resolution Image Generation via Wavelet-Based Diffusion Sampling [1.9474278832087901]
HiWave is a training-free, zero-shot approach that substantially enhances visual fidelity and structural coherence in ultra-high-resolution image synthesis.<n>A user study confirmed HiWave's performance, where it was preferred over the state-of-the-art alternative in more than 80% of comparisons.
arXiv Detail & Related papers (2025-06-25T13:58:37Z)
Ultra-High-Resolution Image Synthesis: Data, Method and Evaluation [21.46605047406198]
Aesthetic-4K dataset is curated for comprehensive research on ultra-high-resolution image synthesis.<n>Diffusion-4K is an innovative framework for the direct generation of ultra-high-resolution images.
arXiv Detail & Related papers (2025-06-02T05:19:40Z)
Higher fidelity perceptual image and video compression with a latent conditioned residual denoising diffusion model [55.2480439325792]
We propose a hybrid compression scheme optimized for perceptual quality, extending the approach of the CDC model with a decoder network.<n>We achieve up to +2dB PSNR fidelity improvements while maintaining comparable LPIPS and FID perceptual scores when compared with CDC.
arXiv Detail & Related papers (2025-05-19T14:13:14Z)
Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency [49.875459658889355]
Free4D is a tuning-free framework for 4D scene generation from a single image. Our key insight is to distill pre-trained foundation models for consistent 4D scene representation. The resulting 4D representation enables real-time, controllable rendering.
arXiv Detail & Related papers (2025-03-26T17:59:44Z)
IV-Mixed Sampler: Leveraging Image Diffusion Models for Enhanced Video Synthesis [22.79121512759783]
IV-Mixed Sampler is a novel training-free algorithm for video diffusion models. It uses IDMs to enhance the quality of each video frame and VDMs to ensure the temporal coherence of the video during the sampling process. It achieves state-of-the-art performance on four benchmarks including UCF-101-FVD, MSR-VTT-FVD, Chronomagic-Bench-150, and Chronomagic-Bench-1649.
arXiv Detail & Related papers (2024-10-05T14:33:28Z)
4K4D: Real-Time 4D View Synthesis at 4K Resolution [86.6582179227016]
This paper targets high-fidelity and real-time view of dynamic 3D scenes at 4K resolution. We propose a 4D point cloud representation that supports hardwareization and enables unprecedented rendering speed. Our representation can be rendered at over 400 FPS on the DNA-Rendering dataset at 1080p resolution and 80 FPS on the ENeRF-Outdoor dataset at 4K resolution using an 4090 GPU.
arXiv Detail & Related papers (2023-10-17T17:57:38Z)
ACDMSR: Accelerated Conditional Diffusion Models for Single Image Super-Resolution [84.73658185158222]
We propose a diffusion model-based super-resolution method called ACDMSR. Our method adapts the standard diffusion model to perform super-resolution through a deterministic iterative denoising process. Our approach generates more visually realistic counterparts for low-resolution images, emphasizing its effectiveness in practical scenarios.
arXiv Detail & Related papers (2023-07-03T06:49:04Z)
Probabilistic-based Feature Embedding of 4-D Light Fields for Compressive Imaging and Denoising [62.347491141163225]
4-D light field (LF) poses great challenges in achieving efficient and effective feature embedding. We propose a probabilistic-based feature embedding (PFE), which learns a feature embedding architecture by assembling various low-dimensional convolution patterns. Our experiments demonstrate the significant superiority of our methods on both real-world and synthetic 4-D LF images.
arXiv Detail & Related papers (2023-06-15T03:46:40Z)
4K-HAZE: A Dehazing Benchmark with 4K Resolution Hazy and Haze-Free Images [12.402054374952485]
We develop a novel method to simulate 4K hazy images from clear images, which first estimates the scene depth, simulates the light rays and object reflectance, then migrates the synthetic images to real domains by using a GAN. We wrap these synthesized images into a benchmark called the 4K-HAZE dataset. The most appealing aspect of our approach is the capability to run a 4K image on a single GPU with 24G RAM in real-time (33fps)
arXiv Detail & Related papers (2023-03-28T09:39:29Z)
4K-NeRF: High Fidelity Neural Radiance Fields at Ultra High Resolutions [19.380248980850727]
We present a novel and effective framework, named 4K-NeRF, to pursue high fidelity view synthesis on the challenging scenarios of ultra high resolutions. We address the issue by exploring ray correlation to enhance high-frequency details recovery. Our method can significantly boost rendering quality on high-frequency details compared with modern NeRF methods, and achieve the state-of-the-art visual quality on 4K ultra-high-resolution scenarios.
arXiv Detail & Related papers (2022-12-09T07:26:49Z)
Towards Efficient and Scale-Robust Ultra-High-Definition Image Demoireing [71.62289021118983]
We present an efficient baseline model ESDNet for tackling 4K moire images, wherein we build a semantic-aligned scale-aware module to address the scale variation of moire patterns. Our approach outperforms state-of-the-art methods by a large margin while being much more lightweight.
arXiv Detail & Related papers (2022-07-20T14:20:52Z)
DiVAE: Photorealistic Images Synthesis with Denoising Diffusion Decoder [73.1010640692609]
We propose a VQ-VAE architecture model with a diffusion decoder (DiVAE) to work as the reconstructing component in image synthesis. Our model achieves state-of-the-art results and generates more photorealistic images specifically.
arXiv Detail & Related papers (2022-06-01T10:39:12Z)
One-shot Ultra-high-Resolution Generative Adversarial Network That Synthesizes 16K Images On A Single GPU [1.9060575156739825]
OUR-GAN is a one-shot generative adversarial network framework that generates non-repetitive 16K images from a single training image. OUR-GAN can synthesize high-quality 16K images with 12.5 GB of GPU memory and 4K images with only 4.29 GB. OUR-GAN is the first one-shot image synthesizer that generates non-repetitive UHR images on a single consumer GPU.
arXiv Detail & Related papers (2022-02-28T13:48:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.