Foreground-aware Semantic Representations for Image Harmonization
- URL: http://arxiv.org/abs/2006.00809v1
- Date: Mon, 1 Jun 2020 09:27:20 GMT
- Title: Foreground-aware Semantic Representations for Image Harmonization
- Authors: Konstantin Sofiiuk, Polina Popenova and Anton Konushin
- Abstract summary: We propose a novel architecture to utilize the space of high-level features learned by a pre-trained classification network.
We extensively evaluate the proposed method on existing image harmonization benchmark and set up a new state-of-the-art in terms of MSE and PSNR metrics.
- Score: 5.156484100374058
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image harmonization is an important step in photo editing to achieve visual
consistency in composite images by adjusting the appearances of foreground to
make it compatible with background. Previous approaches to harmonize composites
are based on training of encoder-decoder networks from scratch, which makes it
challenging for a neural network to learn a high-level representation of
objects. We propose a novel architecture to utilize the space of high-level
features learned by a pre-trained classification network. We create our models
as a combination of existing encoder-decoder architectures and a pre-trained
foreground-aware deep high-resolution network. We extensively evaluate the
proposed method on existing image harmonization benchmark and set up a new
state-of-the-art in terms of MSE and PSNR metrics. The code and trained models
are available at \url{https://github.com/saic-vul/image_harmonization}.
Related papers
- Meta-Exploiting Frequency Prior for Cross-Domain Few-Shot Learning [86.99944014645322]
We introduce a novel framework, Meta-Exploiting Frequency Prior for Cross-Domain Few-Shot Learning.
We decompose each query image into its high-frequency and low-frequency components, and parallel incorporate them into the feature embedding network.
Our framework establishes new state-of-the-art results on multiple cross-domain few-shot learning benchmarks.
arXiv Detail & Related papers (2024-11-03T04:02:35Z) - PROMPT-IML: Image Manipulation Localization with Pre-trained Foundation
Models Through Prompt Tuning [35.39822183728463]
We present a novel Prompt-IML framework for detecting tampered images.
Humans tend to discern authenticity of an image based on semantic and high-frequency information.
Our model can achieve better performance on eight typical fake image datasets.
arXiv Detail & Related papers (2024-01-01T03:45:07Z) - Deep Image Harmonization with Learnable Augmentation [17.690945824240348]
Learnable augmentation is proposed to enrich the illumination diversity of small-scale datasets for better harmonization performance.
SycoNet takes in a real image with foreground mask and a random vector to learn suitable color transformation, which is applied to the foreground of this real image to produce a synthetic composite image.
arXiv Detail & Related papers (2023-08-01T08:40:23Z) - Not All Image Regions Matter: Masked Vector Quantization for
Autoregressive Image Generation [78.13793505707952]
Existing autoregressive models follow the two-stage generation paradigm that first learns a codebook in the latent space for image reconstruction and then completes the image generation autoregressively based on the learned codebook.
We propose a novel two-stage framework, which consists of Masked Quantization VAE (MQ-VAE) Stack model from modeling redundancy.
arXiv Detail & Related papers (2023-05-23T02:15:53Z) - Rank-Enhanced Low-Dimensional Convolution Set for Hyperspectral Image
Denoising [50.039949798156826]
This paper tackles the challenging problem of hyperspectral (HS) image denoising.
We propose rank-enhanced low-dimensional convolution set (Re-ConvSet)
We then incorporate Re-ConvSet into the widely-used U-Net architecture to construct an HS image denoising method.
arXiv Detail & Related papers (2022-07-09T13:35:12Z) - Modeling Image Composition for Complex Scene Generation [77.10533862854706]
We present a method that achieves state-of-the-art results on layout-to-image generation tasks.
After compressing RGB images into patch tokens, we propose the Transformer with Focal Attention (TwFA) for exploring dependencies of object-to-object, object-to-patch and patch-to-patch.
arXiv Detail & Related papers (2022-06-02T08:34:25Z) - Region-aware Adaptive Instance Normalization for Image Harmonization [14.77918186672189]
To acquire photo-realistic composite images, one must adjust the appearance and visual style of the foreground to be compatible with the background.
Existing deep learning methods for harmonizing composite images directly learn an image mapping network from the composite to the real one.
We propose a Region-aware Adaptive Instance Normalization (RAIN) module, which explicitly formulates the visual style from the background and adaptively applies them to the foreground.
arXiv Detail & Related papers (2021-06-05T09:57:17Z) - Two-shot Spatially-varying BRDF and Shape Estimation [89.29020624201708]
We propose a novel deep learning architecture with a stage-wise estimation of shape and SVBRDF.
We create a large-scale synthetic training dataset with domain-randomized geometry and realistic materials.
Experiments on both synthetic and real-world datasets show that our network trained on a synthetic dataset can generalize well to real-world images.
arXiv Detail & Related papers (2020-04-01T12:56:13Z) - Towards Coding for Human and Machine Vision: A Scalable Image Coding
Approach [104.02201472370801]
We come up with a novel image coding framework by leveraging both the compressive and the generative models.
By introducing advanced generative models, we train a flexible network to reconstruct images from compact feature representations and the reference pixels.
Experimental results demonstrate the superiority of our framework in both human visual quality and facial landmark detection.
arXiv Detail & Related papers (2020-01-09T10:37:17Z) - Contextual Encoder-Decoder Network for Visual Saliency Prediction [42.047816176307066]
We propose an approach based on a convolutional neural network pre-trained on a large-scale image classification task.
We combine the resulting representations with global scene information for accurately predicting visual saliency.
Compared to state of the art approaches, the network is based on a lightweight image classification backbone.
arXiv Detail & Related papers (2019-02-18T16:15:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.