Segment Anything Model Meets Image Harmonization
- URL: http://arxiv.org/abs/2312.12729v1
- Date: Wed, 20 Dec 2023 02:57:21 GMT
- Title: Segment Anything Model Meets Image Harmonization
- Authors: Haoxing Chen and Yaohui Li and Zhangxuan Gu and Zhuoer Xu and Jun Lan
and Huaxiong Li
- Abstract summary: Image harmonization is a crucial technique in image composition that aims to seamlessly match the background by adjusting the foreground of composite images.
Current methods adopt either global-level or pixel-level feature matching.
We propose Semantic-guided Region-aware Instance Normalization (SRIN) that can utilize the semantic segmentation maps output by a pre-trained Segment Anything Model (SAM) to guide the visual consistency learning of foreground and background features.
- Score: 13.415810438244788
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Image harmonization is a crucial technique in image composition that aims to
seamlessly match the background by adjusting the foreground of composite
images. Current methods adopt either global-level or pixel-level feature
matching. Global-level feature matching ignores the proximity prior, treating
foreground and background as separate entities. On the other hand, pixel-level
feature matching loses contextual information. Therefore, it is necessary to
use the information from semantic maps that describe different objects to guide
harmonization. In this paper, we propose Semantic-guided Region-aware Instance
Normalization (SRIN) that can utilize the semantic segmentation maps output by
a pre-trained Segment Anything Model (SAM) to guide the visual consistency
learning of foreground and background features. Abundant experiments
demonstrate the superiority of our method for image harmonization over
state-of-the-art methods.
Related papers
- Image-Specific Information Suppression and Implicit Local Alignment for
Text-based Person Search [61.24539128142504]
Text-based person search (TBPS) is a challenging task that aims to search pedestrian images with the same identity from an image gallery given a query text.
Most existing methods rely on explicitly generated local parts to model fine-grained correspondence between modalities.
We propose an efficient joint Multi-level Alignment Network (MANet) for TBPS, which can learn aligned image/text feature representations between modalities at multiple levels.
arXiv Detail & Related papers (2022-08-30T16:14:18Z) - FRIH: Fine-grained Region-aware Image Harmonization [49.420765789360836]
We propose a novel global-local two stages framework for Fine-grained Region-aware Image Harmonization (FRIH)
Our algorithm achieves the best performance on iHarmony4 dataset (PSNR is 38.19 dB) with a lightweight model.
arXiv Detail & Related papers (2022-05-13T04:50:26Z) - Image Harmonization by Matching Regional References [10.249228010611617]
Recent image harmonization methods typically summarize the appearance pattern of global background and apply it to the global foreground without location discrepancy.
For a real image, the appearances (illumination, color temperature, saturation, hue, texture, etc) of different regions can vary significantly.
Previous methods, which transfer the appearance globally, are not optimal.
arXiv Detail & Related papers (2022-04-10T16:23:06Z) - Retrieval-based Spatially Adaptive Normalization for Semantic Image
Synthesis [68.1281982092765]
We propose a novel normalization module, termed as REtrieval-based Spatially AdaptIve normaLization (RESAIL)
RESAIL provides pixel level fine-grained guidance to the normalization architecture.
Experiments on several challenging datasets show that our RESAIL performs favorably against state-of-the-arts in terms of quantitative metrics, visual quality, and subjective evaluation.
arXiv Detail & Related papers (2022-04-06T14:21:39Z) - Learning Contrastive Representation for Semantic Correspondence [150.29135856909477]
We propose a multi-level contrastive learning approach for semantic matching.
We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects.
arXiv Detail & Related papers (2021-09-22T18:34:14Z) - Region-aware Adaptive Instance Normalization for Image Harmonization [14.77918186672189]
To acquire photo-realistic composite images, one must adjust the appearance and visual style of the foreground to be compatible with the background.
Existing deep learning methods for harmonizing composite images directly learn an image mapping network from the composite to the real one.
We propose a Region-aware Adaptive Instance Normalization (RAIN) module, which explicitly formulates the visual style from the background and adaptively applies them to the foreground.
arXiv Detail & Related papers (2021-06-05T09:57:17Z) - BargainNet: Background-Guided Domain Translation for Image Harmonization [26.370523451625466]
Unharmonious foreground and background downgrade the quality of composite image.
Image harmonization, which adjusts the foreground to improve the consistency, is an essential yet challenging task.
We propose an image harmonization network with a novel domain code extractor and well-tailored triplet losses.
arXiv Detail & Related papers (2020-09-19T05:14:08Z) - Mining Cross-Image Semantics for Weakly Supervised Semantic Segmentation [128.03739769844736]
Two neural co-attentions are incorporated into the classifier to capture cross-image semantic similarities and differences.
In addition to boosting object pattern learning, the co-attention can leverage context from other related images to improve localization map inference.
Our algorithm sets new state-of-the-arts on all these settings, demonstrating well its efficacy and generalizability.
arXiv Detail & Related papers (2020-07-03T21:53:46Z) - Unsupervised Learning of Landmarks based on Inter-Intra Subject
Consistencies [72.67344725725961]
We present a novel unsupervised learning approach to image landmark discovery by incorporating the inter-subject landmark consistencies on facial images.
This is achieved via an inter-subject mapping module that transforms original subject landmarks based on an auxiliary subject-related structure.
To recover from the transformed images back to the original subject, the landmark detector is forced to learn spatial locations that contain the consistent semantic meanings both for the paired intra-subject images and between the paired inter-subject images.
arXiv Detail & Related papers (2020-04-16T20:38:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.