Implicit Grid Convolution for Multi-Scale Image Super-Resolution
- URL: http://arxiv.org/abs/2408.09674v1
- Date: Mon, 19 Aug 2024 03:30:15 GMT
- Title: Implicit Grid Convolution for Multi-Scale Image Super-Resolution
- Authors: Dongheon Lee, Seokju Yun, Youngmin Ro,
- Abstract summary: We propose a framework for training multiple integer scales simultaneously with a single model.
We use a single encoder to extract features and introduce a novel upsampler, Implicit Grid Convolution(IGConv)
Our experiments demonstrate that training multiple scales with a single model reduces the training budget and stored parameters by one-third.
- Score: 6.8410780175245165
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recently, Super-Resolution (SR) achieved significant performance improvement by employing neural networks. Most SR methods conventionally train a single model for each targeted scale, which increases redundancy in training and deployment in proportion to the number of scales targeted. This paper challenges this conventional fixed-scale approach. Our preliminary analysis reveals that, surprisingly, encoders trained at different scales extract similar features from images. Furthermore, the commonly used scale-specific upsampler, Sub-Pixel Convolution (SPConv), exhibits significant inter-scale correlations. Based on these observations, we propose a framework for training multiple integer scales simultaneously with a single model. We use a single encoder to extract features and introduce a novel upsampler, Implicit Grid Convolution~(IGConv), which integrates SPConv at all scales within a single module to predict multiple scales. Our extensive experiments demonstrate that training multiple scales with a single model reduces the training budget and stored parameters by one-third while achieving equivalent inference latency and comparable performance. Furthermore, we propose IGConv$^{+}$, which addresses spectral bias and input-independent upsampling and uses ensemble prediction to improve performance. As a result, SRFormer-IGConv$^{+}$ achieves a remarkable 0.25dB improvement in PSNR at Urban100$\times$4 while reducing the training budget, stored parameters, and inference cost compared to the existing SRFormer.
Related papers
- Scale Equalization for Multi-Level Feature Fusion [8.541075075344438]
We find that multi-level features from parallel branches are on different scales.
The scale disequilibrium is a universal and unwanted flaw that leads to detrimental gradient descent.
We propose injecting scale equalizers to achieve scale equilibrium across multi-level features after bilinear upsampling.
arXiv Detail & Related papers (2024-02-02T05:25:51Z) - Binarized Spectral Compressive Imaging [59.18636040850608]
Existing deep learning models for hyperspectral image (HSI) reconstruction achieve good performance but require powerful hardwares with enormous memory and computational resources.
We propose a novel method, Binarized Spectral-Redistribution Network (BiSRNet)
BiSRNet is derived by using the proposed techniques to binarize the base model.
arXiv Detail & Related papers (2023-05-17T15:36:08Z) - Vertical Layering of Quantized Neural Networks for Heterogeneous
Inference [57.42762335081385]
We study a new vertical-layered representation of neural network weights for encapsulating all quantized models into a single one.
We can theoretically achieve any precision network for on-demand service while only needing to train and maintain one model.
arXiv Detail & Related papers (2022-12-10T15:57:38Z) - Effective Invertible Arbitrary Image Rescaling [77.46732646918936]
Invertible Neural Networks (INN) are able to increase upscaling accuracy significantly by optimizing the downscaling and upscaling cycle jointly.
A simple and effective invertible arbitrary rescaling network (IARN) is proposed to achieve arbitrary image rescaling by training only one model in this work.
It is shown to achieve a state-of-the-art (SOTA) performance in bidirectional arbitrary rescaling without compromising perceptual quality in LR outputs.
arXiv Detail & Related papers (2022-09-26T22:22:30Z) - Scale Attention for Learning Deep Face Representation: A Study Against
Visual Scale Variation [69.45176408639483]
We reform the conv layer by resorting to the scale-space theory.
We build a novel style named SCale AttentioN Conv Neural Network (textbfSCAN-CNN)
As a single-shot scheme, the inference is more efficient than multi-shot fusion.
arXiv Detail & Related papers (2022-09-19T06:35:04Z) - Conditional Variational Autoencoder with Balanced Pre-training for
Generative Adversarial Networks [11.46883762268061]
Class imbalance occurs in many real-world applications, including image classification, where the number of images in each class differs significantly.
With imbalanced data, the generative adversarial networks (GANs) leans to majority class samples.
We propose a novel Variational Autoencoder with Balanced Pre-training for Geneversarative Adrial Networks (CAPGAN) as an augmentation tool to generate realistic synthetic images.
arXiv Detail & Related papers (2022-01-13T06:52:58Z) - Scaling Structured Inference with Randomization [64.18063627155128]
We propose a family of dynamic programming (RDP) randomized for scaling structured models to tens of thousands of latent states.
Our method is widely applicable to classical DP-based inference.
It is also compatible with automatic differentiation so can be integrated with neural networks seamlessly.
arXiv Detail & Related papers (2021-12-07T11:26:41Z) - Insta-RS: Instance-wise Randomized Smoothing for Improved Robustness and
Accuracy [9.50143683501477]
Insta-RS is a multiple-start search algorithm that assigns customized Gaussian variances to test examples.
Insta-RS Train is a novel two-stage training algorithm that adaptively adjusts and customizes the noise level of each training example.
We show that our method significantly enhances the average certified radius (ACR) as well as the clean data accuracy.
arXiv Detail & Related papers (2021-03-07T19:46:07Z) - Highly Efficient Salient Object Detection with 100K Parameters [137.74898755102387]
We propose a flexible convolutional module, namely generalized OctConv (gOctConv), to efficiently utilize both in-stage and cross-stages multi-scale features.
We build an extremely light-weighted model, namely CSNet, which achieves comparable performance with about 0.2% (100k) of large models on popular object detection benchmarks.
arXiv Detail & Related papers (2020-03-12T07:00:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.