Region-to-Region: Enhancing Generative Image Harmonization with Adaptive Regional Injection
- URL: http://arxiv.org/abs/2508.09746v1
- Date: Wed, 13 Aug 2025 12:21:51 GMT
- Title: Region-to-Region: Enhancing Generative Image Harmonization with Adaptive Regional Injection
- Authors: Zhiqiu Zhang, Dongqi Fan, Mingjie Wang, Qiang Tang, Jian Yang, Zili Yi,
- Abstract summary: The goal of image harmonization is to adjust the foreground in a composite image to achieve visual consistency with the background.<n>Recently, latent diffusion model (LDM) are applied for harmonization, achieving remarkable results.<n>Current synthetic datasets rely on color transfer, which lacks local variations and fails to capture complex real-world lighting conditions.
- Score: 17.56045093665567
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The goal of image harmonization is to adjust the foreground in a composite image to achieve visual consistency with the background. Recently, latent diffusion model (LDM) are applied for harmonization, achieving remarkable results. However, LDM-based harmonization faces challenges in detail preservation and limited harmonization ability. Additionally, current synthetic datasets rely on color transfer, which lacks local variations and fails to capture complex real-world lighting conditions. To enhance harmonization capabilities, we propose the Region-to-Region transformation. By injecting information from appropriate regions into the foreground, this approach preserves original details while achieving image harmonization or, conversely, generating new composite data. From this perspective, We propose a novel model R2R. Specifically, we design Clear-VAE to preserve high-frequency details in the foreground using Adaptive Filter while eliminating disharmonious elements. To further enhance harmonization, we introduce the Harmony Controller with Mask-aware Adaptive Channel Attention (MACA), which dynamically adjusts the foreground based on the channel importance of both foreground and background regions. To address the limitation of existing datasets, we propose Random Poisson Blending, which transfers color and lighting information from a suitable region to the foreground, thereby generating more diverse and challenging synthetic images. Using this method, we construct a new synthetic dataset, RPHarmony. Experiments demonstrate the superiority of our method over other methods in both quantitative metrics and visual harmony. Moreover, our dataset helps the model generate more realistic images in real examples. Our code, dataset, and model weights have all been released for open access.
Related papers
- LuxDiT: Lighting Estimation with Video Diffusion Transformer [66.60450792095901]
Estimating scene lighting from a single image or video remains a longstanding challenge in computer vision and graphics.<n>We propose LuxDiT, a novel data-driven approach that fine-tunes a video diffusion transformer to generate HDR environment maps conditioned on visual input.
arXiv Detail & Related papers (2025-09-03T19:59:20Z) - Frequency Domain-Based Diffusion Model for Unpaired Image Dehazing [92.61216319417208]
We propose a novel frequency domain-based diffusion model, named ours, for fully exploiting the beneficial knowledge in unpaired clear data.<n>Inspired by the strong generative ability shown by Diffusion Models (DMs), we tackle the dehazing task from the perspective of frequency domain reconstruction.
arXiv Detail & Related papers (2025-07-02T01:22:46Z) - Deep Image Harmonization with Learnable Augmentation [17.690945824240348]
Learnable augmentation is proposed to enrich the illumination diversity of small-scale datasets for better harmonization performance.
SycoNet takes in a real image with foreground mask and a random vector to learn suitable color transformation, which is applied to the foreground of this real image to produce a synthetic composite image.
arXiv Detail & Related papers (2023-08-01T08:40:23Z) - Deep Image Harmonization with Globally Guided Feature Transformation and
Relation Distillation [20.302430505018]
We show that using global information to guide foreground feature transformation could achieve significant improvement.
We also propose to transfer the foreground-background relation from real images to composite images, which can provide intermediate supervision for the transformed encoder features.
arXiv Detail & Related papers (2023-08-01T07:53:25Z) - Hierarchical Dynamic Image Harmonization [15.886047676987316]
We propose a hierarchical dynamic network (HDNet) to adapt features from local to global view for better feature transformation in efficient image harmonization.
The proposed HDNet significantly reduces the total model parameters by more than 80% compared to previous methods.
Notably, the HDNet achieves a 4% improvement in PSNR and a 19% reduction in MSE compared to the prior state-of-the-art methods.
arXiv Detail & Related papers (2022-11-16T03:15:19Z) - Image Harmonization with Region-wise Contrastive Learning [51.309905690367835]
We propose a novel image harmonization framework with external style fusion and region-wise contrastive learning scheme.
Our method attempts to bring together corresponding positive and negative samples by maximizing the mutual information between the foreground and background styles.
arXiv Detail & Related papers (2022-05-27T15:46:55Z) - FRIH: Fine-grained Region-aware Image Harmonization [49.420765789360836]
We propose a novel global-local two stages framework for Fine-grained Region-aware Image Harmonization (FRIH)
Our algorithm achieves the best performance on iHarmony4 dataset (PSNR is 38.19 dB) with a lightweight model.
arXiv Detail & Related papers (2022-05-13T04:50:26Z) - Interactive Portrait Harmonization [99.15331091722231]
Current image harmonization methods consider the entire background as the guidance for harmonization.
A new flexible framework that allows users to pick certain regions of the background image and use it to guide the harmonization is proposed.
Inspired by professional portrait harmonization users, we also introduce a new luminance matching loss to optimally match the color/luminance conditions between the composite foreground and select reference region.
arXiv Detail & Related papers (2022-03-15T19:30:34Z) - SSH: A Self-Supervised Framework for Image Harmonization [97.16345684998788]
We propose a novel Self-Supervised Harmonization framework (SSH) that can be trained using just "free" natural images without being edited.
Our results show that the proposedSSH outperforms previous state-of-the-art methods in terms of reference metrics, visual quality, and subject user study.
arXiv Detail & Related papers (2021-08-15T19:51:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.