Related papers: Native-Resolution Image Synthesis

Native-Resolution Image Synthesis

URL: http://arxiv.org/abs/2506.03131v1
Date: Tue, 03 Jun 2025 17:57:33 GMT
Title: Native-Resolution Image Synthesis
Authors: Zidong Wang, Lei Bai, Xiangyu Yue, Wanli Ouyang, Yiyuan Zhang,
Abstract summary: We introduce native-resolution image synthesis, a novel generative modeling paradigm that enables the synthesis of images at arbitrary resolutions and aspect ratios.<n>A single NiT model simultaneously achieves the state-of-the-art performance on both ImageNet-256x256 and 512x512 benchmarks.<n>Surprisingly, akin to the robust zero-shot capabilities seen in advanced large language models, NiT, trained solely on ImageNet, demonstrates excellent zero-shot generalization performance.
Score: 79.73854557930089
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce native-resolution image synthesis, a novel generative modeling paradigm that enables the synthesis of images at arbitrary resolutions and aspect ratios. This approach overcomes the limitations of conventional fixed-resolution, square-image methods by natively handling variable-length visual tokens, a core challenge for traditional techniques. To this end, we introduce the Native-resolution diffusion Transformer (NiT), an architecture designed to explicitly model varying resolutions and aspect ratios within its denoising process. Free from the constraints of fixed formats, NiT learns intrinsic visual distributions from images spanning a broad range of resolutions and aspect ratios. Notably, a single NiT model simultaneously achieves the state-of-the-art performance on both ImageNet-256x256 and 512x512 benchmarks. Surprisingly, akin to the robust zero-shot capabilities seen in advanced large language models, NiT, trained solely on ImageNet, demonstrates excellent zero-shot generalization performance. It successfully generates high-fidelity images at previously unseen high resolutions (e.g., 1536 x 1536) and diverse aspect ratios (e.g., 16:9, 3:1, 4:3), as shown in Figure 1. These findings indicate the significant potential of native-resolution modeling as a bridge between visual generative modeling and advanced LLM methodologies.

Related papers

VISION-XL: High Definition Video Inverse Problem Solver using Latent Image Diffusion Models [58.464465016269614]
We propose a novel framework for solving high-definition video inverse problems using latent image diffusion models.<n>Our approach delivers HD-resolution reconstructions in under 6 seconds per frame on a single NVIDIA 4090 GPU.
arXiv Detail & Related papers (2024-11-29T08:10:49Z)
Use of triplet loss for facial restoration in low-resolution images [5.448070998907116]
We propose a novel SR model FTLGAN, which focuses on generating high-resolution images that preserve individual identities. The results are compelling, demonstrating a mean value of d' 21% above the best current state-of-the-art models.
arXiv Detail & Related papers (2024-09-05T13:42:20Z)
MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning [38.560064789022704]
MegaFusion extends existing diffusion-based text-to-image models towards efficient higher-resolution generation. We employ an innovative truncate and relay strategy to bridge the denoising processes across different resolutions. By integrating dilated convolutions and noise re-scheduling, we further adapt the model's priors for higher resolution.
arXiv Detail & Related papers (2024-08-20T16:53:34Z)
Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models and Time-Dependent Layer Normalization [26.926712014346432]
This paper presents innovative enhancements to diffusion models by integrating a novel multi-resolution network and time-dependent layer normalization.<n>Our method's efficacy is demonstrated on the class-conditional ImageNet generation benchmark, setting new state-of-the-art FID scores of 1.70 on ImageNet 256 x 256 and 2.89 on ImageNet 512 x 512.
arXiv Detail & Related papers (2024-06-13T17:59:58Z)
Matryoshka Diffusion Models [38.26966802461602]
Diffusion models are the de facto approach for generating high-quality images and videos. We introduce Matryoshka Diffusion Models, an end-to-end framework for high-resolution image and video synthesis. We demonstrate the effectiveness of our approach on various benchmarks, including class-conditioned image generation, high-resolution text-to-image, and text-to-video applications.
arXiv Detail & Related papers (2023-10-23T17:20:01Z)
ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models [126.35334860896373]
We investigate the capability of generating images from pre-trained diffusion models at much higher resolutions than the training image sizes. Existing works for higher-resolution generation, such as attention-based and joint-diffusion approaches, cannot well address these issues. We propose a simple yet effective re-dilation that can dynamically adjust the convolutional perception field during inference.
arXiv Detail & Related papers (2023-10-11T17:52:39Z)
InfinityGAN: Towards Infinite-Resolution Image Synthesis [92.40782797030977]
We present InfinityGAN, a method to generate arbitrary-resolution images. We show how it trains and infers patch-by-patch seamlessly with low computational resources.
arXiv Detail & Related papers (2021-04-08T17:59:30Z)
Aggregated Contextual Transformations for High-Resolution Image Inpainting [57.241749273816374]
We propose an enhanced GAN-based model, named Aggregated COntextual-Transformation GAN (AOT-GAN) for high-resolution image inpainting. To enhance context reasoning, we construct the generator of AOT-GAN by stacking multiple layers of a proposed AOT block. For improving texture synthesis, we enhance the discriminator of AOT-GAN by training it with a tailored mask-prediction task.
arXiv Detail & Related papers (2021-04-03T15:50:17Z)
Improved Techniques for Training Score-Based Generative Models [104.20217659157701]
We provide a new theoretical analysis of learning and sampling from score models in high dimensional spaces. We can effortlessly scale score-based generative models to images with unprecedented resolutions. Our score-based models can generate high-fidelity samples that rival best-in-class GANs on various image datasets.
arXiv Detail & Related papers (2020-06-16T09:17:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.