SSH: A Self-Supervised Framework for Image Harmonization
- URL: http://arxiv.org/abs/2108.06805v2
- Date: Tue, 17 Aug 2021 18:02:53 GMT
- Title: SSH: A Self-Supervised Framework for Image Harmonization
- Authors: Yifan Jiang, He Zhang, Jianming Zhang, Yilin Wang, Zhe Lin, Kalyan
Sunkavalli, Simon Chen, Sohrab Amirghodsi, Sarah Kong, Zhangyang Wang
- Abstract summary: We propose a novel Self-Supervised Harmonization framework (SSH) that can be trained using just "free" natural images without being edited.
Our results show that the proposedSSH outperforms previous state-of-the-art methods in terms of reference metrics, visual quality, and subject user study.
- Score: 97.16345684998788
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image harmonization aims to improve the quality of image compositing by
matching the "appearance" (\eg, color tone, brightness and contrast) between
foreground and background images. However, collecting large-scale annotated
datasets for this task requires complex professional retouching. Instead, we
propose a novel Self-Supervised Harmonization framework (SSH) that can be
trained using just "free" natural images without being edited. We reformulate
the image harmonization problem from a representation fusion perspective, which
separately processes the foreground and background examples, to address the
background occlusion issue. This framework design allows for a dual data
augmentation method, where diverse [foreground, background, pseudo GT] triplets
can be generated by cropping an image with perturbations using 3D color lookup
tables (LUTs). In addition, we build a real-world harmonization dataset as
carefully created by expert users, for evaluation and benchmarking purposes.
Our results show that the proposed self-supervised method outperforms previous
state-of-the-art methods in terms of reference metrics, visual quality, and
subject user study. Code and dataset are available at
\url{https://github.com/VITA-Group/SSHarmonization}.
Related papers
- Fine-grained Image-to-LiDAR Contrastive Distillation with Visual Foundation Models [55.99654128127689]
Visual Foundation Models (VFMs) are used to enhance 3D representation learning.
VFMs generate semantic labels for weakly-supervised pixel-to-point contrastive distillation.
We adapt sampling probabilities of points to address imbalances in spatial distribution and category frequency.
arXiv Detail & Related papers (2024-05-23T07:48:19Z) - Exposure Bracketing is All You Need for Unifying Image Restoration and Enhancement Tasks [50.822601495422916]
We propose to utilize exposure bracketing photography to unify image restoration and enhancement tasks.
Due to the difficulty in collecting real-world pairs, we suggest a solution that first pre-trains the model with synthetic paired data.
In particular, a temporally modulated recurrent network (TMRNet) and self-supervised adaptation method are proposed.
arXiv Detail & Related papers (2024-01-01T14:14:35Z) - FreePIH: Training-Free Painterly Image Harmonization with Diffusion
Model [19.170302996189335]
Our FreePIH method tames the denoising process as a plug-in module for foreground image style transfer.
We make use of multi-scale features to enforce the consistency of the content and stability of the foreground objects in the latent space.
Our method can surpass representative baselines by large margins.
arXiv Detail & Related papers (2023-11-25T04:23:49Z) - Image Harmonization with Region-wise Contrastive Learning [51.309905690367835]
We propose a novel image harmonization framework with external style fusion and region-wise contrastive learning scheme.
Our method attempts to bring together corresponding positive and negative samples by maximizing the mutual information between the foreground and background styles.
arXiv Detail & Related papers (2022-05-27T15:46:55Z) - SCS-Co: Self-Consistent Style Contrastive Learning for Image
Harmonization [29.600429707123645]
We propose a self-consistent style contrastive learning scheme (SCS-Co) for image harmonization.
By dynamically generating multiple negative samples, our SCS-Co can learn more distortion knowledge and well regularize the generated harmonized image.
In addition, we propose a background-attentional adaptive instance normalization (BAIN) to achieve an attention-weighted background feature distribution.
arXiv Detail & Related papers (2022-04-29T09:22:01Z) - Interactive Portrait Harmonization [99.15331091722231]
Current image harmonization methods consider the entire background as the guidance for harmonization.
A new flexible framework that allows users to pick certain regions of the background image and use it to guide the harmonization is proposed.
Inspired by professional portrait harmonization users, we also introduce a new luminance matching loss to optimally match the color/luminance conditions between the composite foreground and select reference region.
arXiv Detail & Related papers (2022-03-15T19:30:34Z) - A Generative Adversarial Framework for Optimizing Image Matting and
Harmonization Simultaneously [7.541357996797061]
We propose a new Generative Adversarial (GAN) framework which optimizing the matting network and the harmonization network based on a self-attention discriminator.
Our dataset and dataset generating pipeline can be found in urlhttps://git.io/HaMaGAN
arXiv Detail & Related papers (2021-08-13T06:48:14Z) - Region-aware Adaptive Instance Normalization for Image Harmonization [14.77918186672189]
To acquire photo-realistic composite images, one must adjust the appearance and visual style of the foreground to be compatible with the background.
Existing deep learning methods for harmonizing composite images directly learn an image mapping network from the composite to the real one.
We propose a Region-aware Adaptive Instance Normalization (RAIN) module, which explicitly formulates the visual style from the background and adaptively applies them to the foreground.
arXiv Detail & Related papers (2021-06-05T09:57:17Z) - Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD
Images [69.5662419067878]
Grounding referring expressions in RGBD image has been an emerging field.
We present a novel task of 3D visual grounding in single-view RGBD image where the referred objects are often only partially scanned due to occlusion.
Our approach first fuses the language and the visual features at the bottom level to generate a heatmap that localizes the relevant regions in the RGBD image.
Then our approach conducts an adaptive feature learning based on the heatmap and performs the object-level matching with another visio-linguistic fusion to finally ground the referred object.
arXiv Detail & Related papers (2021-03-14T11:18:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.