Deep Image Harmonization by Bridging the Reality Gap
- URL: http://arxiv.org/abs/2103.17104v1
- Date: Wed, 31 Mar 2021 14:19:56 GMT
- Title: Deep Image Harmonization by Bridging the Reality Gap
- Authors: Wenyan Cong, Junyan Cao, Li Niu, Jianfu Zhang, Xuesong Gao, Zhiwei
Tang, Liqing Zhang
- Abstract summary: We propose to construct a large-scale rendered harmonization dataset RHHarmony with fewer human efforts to augment the existing real-world dataset.
To leverage both real-world images and rendered images, we propose a cross-domain harmonization network CharmNet.
- Score: 18.86655082192153
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image harmonization has been significantly advanced with large-scale
harmonization dataset. However, the current way to build dataset is still
labor-intensive, which adversely affects the extendability of dataset. To
address this problem, we propose to construct a large-scale rendered
harmonization dataset RHHarmony with fewer human efforts to augment the
existing real-world dataset. To leverage both real-world images and rendered
images, we propose a cross-domain harmonization network CharmNet to bridge the
domain gap between two domains. Moreover, we also employ well-designed style
classifiers and losses to facilitate cross-domain knowledge transfer. Extensive
experiments demonstrate the potential of using rendered images for image
harmonization and the effectiveness of our proposed network. Our dataset and
code are available at
https://github.com/bcmi/Rendered_Image_Harmonization_Datasets.
Related papers
- Robust Disaster Assessment from Aerial Imagery Using Text-to-Image Synthetic Data [66.49494950674402]
We leverage emerging text-to-image generative models in creating large-scale synthetic supervision for the task of damage assessment from aerial images.
We build an efficient and easily scalable pipeline to generate thousands of post-disaster images from low-resource domains.
We validate the strength of our proposed framework under cross-geography domain transfer setting from xBD and SKAI images in both single-source and multi-source settings.
arXiv Detail & Related papers (2024-05-22T16:07:05Z) - Getting it Right: Improving Spatial Consistency in Text-to-Image Models [103.52640413616436]
One of the key shortcomings in current text-to-image (T2I) models is their inability to consistently generate images which faithfully follow the spatial relationships specified in the text prompt.
We create SPRIGHT, the first spatially focused, large-scale dataset, by re-captioning 6 million images from 4 widely used vision datasets.
We find that training on images containing a larger number of objects leads to substantial improvements in spatial consistency, including state-of-the-art results on T2I-CompBench with a spatial score of 0.2133, by fine-tuning on 500 images.
arXiv Detail & Related papers (2024-04-01T15:55:25Z) - Exposure Bracketing is All You Need for Unifying Image Restoration and Enhancement Tasks [50.822601495422916]
We propose to utilize exposure bracketing photography to unify image restoration and enhancement tasks.
Due to the difficulty in collecting real-world pairs, we suggest a solution that first pre-trains the model with synthetic paired data.
In particular, a temporally modulated recurrent network (TMRNet) and self-supervised adaptation method are proposed.
arXiv Detail & Related papers (2024-01-01T14:14:35Z) - Deep Image Harmonization with Learnable Augmentation [17.690945824240348]
Learnable augmentation is proposed to enrich the illumination diversity of small-scale datasets for better harmonization performance.
SycoNet takes in a real image with foreground mask and a random vector to learn suitable color transformation, which is applied to the foreground of this real image to produce a synthetic composite image.
arXiv Detail & Related papers (2023-08-01T08:40:23Z) - Deep Image Harmonization with Globally Guided Feature Transformation and
Relation Distillation [20.302430505018]
We show that using global information to guide foreground feature transformation could achieve significant improvement.
We also propose to transfer the foreground-background relation from real images to composite images, which can provide intermediate supervision for the transformed encoder features.
arXiv Detail & Related papers (2023-08-01T07:53:25Z) - Painterly Image Harmonization in Dual Domains [13.067850524730698]
We propose a novel painterly harmonization network consisting of a dual-domain generator and a dual-domain discriminator.
The dual-domain generator performs harmonization by using AdaIN modules in the spatial domain and our proposed ResFFT modules in the frequency domain.
The dual-domain discriminator attempts to distinguish the inharmonious patches based on the spatial feature and frequency feature of each patch, which can enhance the ability of generator in an adversarial manner.
arXiv Detail & Related papers (2022-12-17T11:00:34Z) - Towards Scale Consistent Monocular Visual Odometry by Learning from the
Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data.
We first train a scale-aware disparity network using both monocular real images and stereo virtual data.
The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z) - A Generative Adversarial Framework for Optimizing Image Matting and
Harmonization Simultaneously [7.541357996797061]
We propose a new Generative Adversarial (GAN) framework which optimizing the matting network and the harmonization network based on a self-attention discriminator.
Our dataset and dataset generating pipeline can be found in urlhttps://git.io/HaMaGAN
arXiv Detail & Related papers (2021-08-13T06:48:14Z) - Using GANs to Augment Data for Cloud Image Segmentation Task [2.294014185517203]
We show the effectiveness of using Generative Adversarial Networks (GANs) to generate data to augment the training set.
We also present a way to estimate ground-truth binary maps for the GAN-generated images to facilitate their effective use as augmented images.
arXiv Detail & Related papers (2021-06-06T09:01:43Z) - Low Light Image Enhancement via Global and Local Context Modeling [164.85287246243956]
We introduce a context-aware deep network for low-light image enhancement.
First, it features a global context module that models spatial correlations to find complementary cues over full spatial domain.
Second, it introduces a dense residual block that captures local context with a relatively large receptive field.
arXiv Detail & Related papers (2021-01-04T09:40:54Z) - Image Fine-grained Inpainting [89.17316318927621]
We present a one-stage model that utilizes dense combinations of dilated convolutions to obtain larger and more effective receptive fields.
To better train this efficient generator, except for frequently-used VGG feature matching loss, we design a novel self-guided regression loss.
We also employ a discriminator with local and global branches to ensure local-global contents consistency.
arXiv Detail & Related papers (2020-02-07T03:45:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.