Spatial-Separated Curve Rendering Network for Efficient and
High-Resolution Image Harmonization
- URL: http://arxiv.org/abs/2109.05750v2
- Date: Tue, 14 Sep 2021 08:02:51 GMT
- Title: Spatial-Separated Curve Rendering Network for Efficient and
High-Resolution Image Harmonization
- Authors: Jingtang Liang, Xiaodong Cun and Chi-Man Pun
- Abstract summary: We propose a novel spatial-separated curve rendering network (S$2$CRNet) for efficient and high-resolution image harmonization.
The proposed method reduces more than 90% parameters compared with previous methods.
Our method can work smoothly on higher resolution images in real-time which is more than 10$times$ faster than the existing methods.
- Score: 59.19214040221055
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image harmonization aims to modify the color of the composited region with
respect to the specific background. Previous works model this task as a
pixel-wise image-to-image translation using UNet family structures. However,
the model size and computational cost limit the performability of their models
on edge devices and higher-resolution images. To this end, we propose a novel
spatial-separated curve rendering network (S$^2$CRNet) for efficient and
high-resolution image harmonization for the first time. In S$^2$CRNet, we
firstly extract the spatial-separated embeddings from the thumbnails of the
masked foreground and background individually. Then, we design a curve
rendering module (CRM), which learns and combines the spatial-specific
knowledge using linear layers to generate the parameters of the pixel-wise
curve mapping in the foreground region. Finally, we directly render the
original high-resolution images using the learned color curve. Besides, we also
make two extensions of the proposed framework via the Cascaded-CRM and
Semantic-CRM for cascaded refinement and semantic guidance, respectively.
Experiments show that the proposed method reduces more than 90% parameters
compared with previous methods but still achieves the state-of-the-art
performance on both synthesized iHarmony4 and real-world DIH test set.
Moreover, our method can work smoothly on higher resolution images in real-time
which is more than 10$\times$ faster than the existing methods. The code and
pre-trained models will be made available and released at
https://github.com/stefanLeong/S2CRNet.
Related papers
- Multi-Head Attention Residual Unfolded Network for Model-Based Pansharpening [2.874893537471256]
Unfolding fusion methods integrate the powerful representation capabilities of deep learning with the robustness of model-based approaches.
In this paper, we propose a model-based deep unfolded method for satellite image fusion.
Experimental results on PRISMA, Quickbird, and WorldView2 datasets demonstrate the superior performance of our method.
arXiv Detail & Related papers (2024-09-04T13:05:00Z) - Realistic Extreme Image Rescaling via Generative Latent Space Learning [51.85790402171696]
We propose a novel framework called Latent Space Based Image Rescaling (LSBIR) for extreme image rescaling tasks.
LSBIR effectively leverages powerful natural image priors learned by a pre-trained text-to-image diffusion model to generate realistic HR images.
In the first stage, a pseudo-invertible encoder-decoder models the bidirectional mapping between the latent features of the HR image and the target-sized LR image.
In the second stage, the reconstructed features from the first stage are refined by a pre-trained diffusion model to generate more faithful and visually pleasing details.
arXiv Detail & Related papers (2024-08-17T09:51:42Z) - Beyond Learned Metadata-based Raw Image Reconstruction [86.1667769209103]
Raw images have distinct advantages over sRGB images, e.g., linearity and fine-grained quantization levels.
They are not widely adopted by general users due to their substantial storage requirements.
We propose a novel framework that learns a compact representation in the latent space, serving as metadata.
arXiv Detail & Related papers (2023-06-21T06:59:07Z) - Multiscale Representation for Real-Time Anti-Aliasing Neural Rendering [84.37776381343662]
Mip-NeRF proposes a multiscale representation as a conical frustum to encode scale information.
We propose mip voxel grids (Mip-VoG), an explicit multiscale representation for real-time anti-aliasing rendering.
Our approach is the first to offer multiscale training and real-time anti-aliasing rendering simultaneously.
arXiv Detail & Related papers (2023-04-20T04:05:22Z) - CoordFill: Efficient High-Resolution Image Inpainting via Parameterized
Coordinate Querying [52.91778151771145]
In this paper, we try to break the limitations for the first time thanks to the recent development of continuous implicit representation.
Experiments show that the proposed method achieves real-time performance on the 2048$times$2048 images using a single GTX 2080 Ti GPU.
arXiv Detail & Related papers (2023-03-15T11:13:51Z) - Dense Pixel-to-Pixel Harmonization via Continuous Image Representation [22.984119094424056]
We propose a novel image Harmonization method based on Implicit neural Networks (HINet)
Inspired by the Retinex theory, we decouple the harmonizations into two parts to respectively capture the content and environment of composite images.
Extensive experiments have demonstrated the effectiveness of our method compared with state-of-the-art methods.
arXiv Detail & Related papers (2023-03-03T02:52:28Z) - FRIH: Fine-grained Region-aware Image Harmonization [49.420765789360836]
We propose a novel global-local two stages framework for Fine-grained Region-aware Image Harmonization (FRIH)
Our algorithm achieves the best performance on iHarmony4 dataset (PSNR is 38.19 dB) with a lightweight model.
arXiv Detail & Related papers (2022-05-13T04:50:26Z) - SCSNet: An Efficient Paradigm for Learning Simultaneously Image
Colorization and Super-Resolution [39.77987463287673]
We present an efficient paradigm to perform Simultaneously Image Colorization and Super-resolution (SCS)
The proposed method consists of two parts: colorization branch for learning color information that employs the proposed plug-and-play emphPyramid Valve Cross Attention (PVCAttn) module.
Our SCSNet supports both automatic and referential modes that is more flexible for practical application.
arXiv Detail & Related papers (2022-01-12T08:59:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.