Efficient Scale-Invariant Generator with Column-Row Entangled Pixel
Synthesis
- URL: http://arxiv.org/abs/2303.14157v3
- Date: Tue, 25 Apr 2023 08:49:08 GMT
- Title: Efficient Scale-Invariant Generator with Column-Row Entangled Pixel
Synthesis
- Authors: Thuan Hoang Nguyen, Thanh Van Le, Anh Tran
- Abstract summary: We propose a new generative model that is both efficient and scale-equivariant without using any spatial convolutions or coarse-to-fine design.
Experiments on various datasets, including FFHQ, LSUN-Church, MetFaces, and Flickr-Scenery, confirm CREPS' ability to synthesize scale-consistent and alias-free images.
- Score: 3.222802562733787
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Any-scale image synthesis offers an efficient and scalable solution to
synthesize photo-realistic images at any scale, even going beyond 2K
resolution. However, existing GAN-based solutions depend excessively on
convolutions and a hierarchical architecture, which introduce inconsistency and
the $``$texture sticking$"$ issue when scaling the output resolution. From
another perspective, INR-based generators are scale-equivariant by design, but
their huge memory footprint and slow inference hinder these networks from being
adopted in large-scale or real-time systems. In this work, we propose
$\textbf{C}$olumn-$\textbf{R}$ow $\textbf{E}$ntangled $\textbf{P}$ixel
$\textbf{S}$ynthesis ($\textbf{CREPS}$), a new generative model that is both
efficient and scale-equivariant without using any spatial convolutions or
coarse-to-fine design. To save memory footprint and make the system scalable,
we employ a novel bi-line representation that decomposes layer-wise feature
maps into separate $``$thick$"$ column and row encodings. Experiments on
various datasets, including FFHQ, LSUN-Church, MetFaces, and Flickr-Scenery,
confirm CREPS' ability to synthesize scale-consistent and alias-free images at
any arbitrary resolution with proper training and inference speed. Code is
available at https://github.com/VinAIResearch/CREPS.
Related papers
- Scan Clusters, Not Pixels: A Cluster-Centric Paradigm for Efficient Ultra-high-definition Image Restoration [28.17529244607509]
C$2$SSM is a visual state space model that shifts from pixel-serial to cluster-serial scanning.<n>Our core discovery is that the rich feature distribution of a UHD image can be distilled into a sparse set of semantic centroids.<n>More than a solution, C$2$SSM charts a new course for efficient large-scale vision: scan clusters, not pixels.
arXiv Detail & Related papers (2026-02-25T13:45:50Z) - GPSToken: Gaussian Parameterized Spatially-adaptive Tokenization for Image Representation and Generation [19.94399008500357]
GPSToken is a novel $textbfG$aussian $textbfP$arameterized $textbfS$patially-adaptive $textbfToken$ization framework.<n>GPSToken disentangles spatial layout (Gaussian parameters) from texture features to enable efficient two-stage generation.
arXiv Detail & Related papers (2025-09-01T04:01:37Z) - $\infty$-Brush: Controllable Large Image Synthesis with Diffusion Models in Infinite Dimensions [58.42011190989414]
We introduce a novel conditional diffusion model in infinite dimensions, $infty$-Brush for controllable large image synthesis.
To our best knowledge, $infty$-Brush is the first conditional diffusion model in function space, that can controllably synthesize images at arbitrary resolutions of up to $4096times4096$ pixels.
arXiv Detail & Related papers (2024-07-20T00:04:49Z) - Fully $1\ imes1$ Convolutional Network for Lightweight Image
Super-Resolution [79.04007257606862]
Deep models have significant process on single image super-resolution (SISR) tasks, in particular large models with large kernel ($3times3$ or more)
$1times1$ convolutions bring substantial computational efficiency, but struggle with aggregating local spatial representations.
We propose a simple yet effective fully $1times1$ convolutional network, named Shift-Conv-based Network (SCNet)
arXiv Detail & Related papers (2023-07-30T06:24:03Z) - Urban Radiance Field Representation with Deformable Neural Mesh
Primitives [41.104140341641006]
Deformable Neural Mesh Primitive(DNMP) is a flexible and compact neural variant of classic mesh representation.
Our representation enables fast rendering (2.07ms/1k pixels) and low peak memory usage (110MB/1k pixels)
We present a lightweight version that can run 33$times$ faster than vanilla NeRFs, and comparable to the highly-optimized Instant-NGP (0.61 vs 0.71ms/1k pixels)
arXiv Detail & Related papers (2023-07-20T11:24:55Z) - CoordFill: Efficient High-Resolution Image Inpainting via Parameterized
Coordinate Querying [52.91778151771145]
In this paper, we try to break the limitations for the first time thanks to the recent development of continuous implicit representation.
Experiments show that the proposed method achieves real-time performance on the 2048$times$2048 images using a single GTX 2080 Ti GPU.
arXiv Detail & Related papers (2023-03-15T11:13:51Z) - {\mu}Split: efficient image decomposition for microscopy data [50.794670705085835]
muSplit is a dedicated approach for trained image decomposition in the context of fluorescence microscopy images.
We introduce lateral contextualization (LC), a novel meta-architecture that enables the memory efficient incorporation of large image-context.
We apply muSplit to five decomposition tasks, one on a synthetic dataset, four others derived from real microscopy data.
arXiv Detail & Related papers (2022-11-23T11:26:24Z) - Learning sparse auto-encoders for green AI image coding [5.967279020820772]
In this paper, we address the problem of lossy image compression using a CAE with a small memory footprint and low computational power usage.
We propose a constrained approach and a new structured sparse learning method.
Experimental results show that the $ell_1,1$ constraint provides the best structured proximal sparsity, resulting in a high reduction of memory and computational cost.
arXiv Detail & Related papers (2022-09-09T06:31:46Z) - Adaptive Local Implicit Image Function for Arbitrary-scale
Super-resolution [61.95533972380704]
Local implicit image function (LIIF) denotes images as a continuous function where pixel values are expansion by using the corresponding coordinates as inputs.
LIIF can be adopted for arbitrary-scale image super-resolution tasks, resulting in a single effective and efficient model for various up-scaling factors.
We propose a novel adaptive local image function (A-LIIF) to alleviate this problem.
arXiv Detail & Related papers (2022-08-07T11:23:23Z) - EpiGRAF: Rethinking training of 3D GANs [60.38818140637367]
We show that it is possible to obtain a high-resolution 3D generator with SotA image quality by following a completely different route of simply training the model patch-wise.
The resulting model, named EpiGRAF, is an efficient, high-resolution, pure 3D generator.
arXiv Detail & Related papers (2022-06-21T17:08:23Z) - $\ell_1$DecNet+: A new architecture framework by $\ell_1$ decomposition and iteration unfolding for sparse feature segmentation [4.150107303000611]
$ell_$DecNet is an unfolded network derived from a variational decomposition model incorporating $ell_$ related sparse regularization.
We develop $ell_$DecNet+, a learnable architecture framework consisting of our $ell_$DecNet and a segmentation module which operates over extracted sparse features.
We evaluate the effectiveness of $ell_$DecNet+ on two commonly encountered sparse segmentation tasks: retinal vessel segmentation in medical image processing and pavement crack detection in industrial abnormality identification.
arXiv Detail & Related papers (2022-03-05T09:17:32Z) - Near Perfect GAN Inversion [17.745342857726925]
We derive an algorithm that achieves near perfect reconstructions of photos.
We show that this approach can not only produce synthetic images that are indistinguishable from the real photos we wish to replicate, but that these images are readily editable.
arXiv Detail & Related papers (2022-02-23T23:58:13Z) - Spatial-Separated Curve Rendering Network for Efficient and
High-Resolution Image Harmonization [59.19214040221055]
We propose a novel spatial-separated curve rendering network (S$2$CRNet) for efficient and high-resolution image harmonization.
The proposed method reduces more than 90% parameters compared with previous methods.
Our method can work smoothly on higher resolution images in real-time which is more than 10$times$ faster than the existing methods.
arXiv Detail & Related papers (2021-09-13T07:20:16Z) - InfinityGAN: Towards Infinite-Resolution Image Synthesis [92.40782797030977]
We present InfinityGAN, a method to generate arbitrary-resolution images.
We show how it trains and infers patch-by-patch seamlessly with low computational resources.
arXiv Detail & Related papers (2021-04-08T17:59:30Z) - PNEN: Pyramid Non-Local Enhanced Networks [23.17149002568982]
We propose a novel non-local module, Pyramid Non-local Block, to build up connection between every pixel and all remain pixels.
Based on the proposed module, we devise a Pyramid Non-local Enhanced Networks for edge-preserving image smoothing.
We integrate it into two existing methods for image denoising and single image super-resolution, achieving consistently improved performance.
arXiv Detail & Related papers (2020-08-22T03:10:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.