HiFi-Inpaint: Towards High-Fidelity Reference-Based Inpainting for Generating Detail-Preserving Human-Product Images
- URL: http://arxiv.org/abs/2603.02210v2
- Date: Tue, 03 Mar 2026 05:00:58 GMT
- Title: HiFi-Inpaint: Towards High-Fidelity Reference-Based Inpainting for Generating Detail-Preserving Human-Product Images
- Authors: Yichen Liu, Donghao Zhou, Jie Wang, Xin Gao, Guisheng Liu, Jiatong Li, Quanwei Zhang, Qiang Lyu, Lanqing Guo, Shilei Wen, Weiqiang Wang, Pheng-Ann Heng,
- Abstract summary: HiFi-Inpaint is a novel high-fidelity reference-based inpainting framework tailored for generating human-product images.<n>We introduce Shared Enhancement Attention (SEA) to refine fine-grained product features and Detail-Aware Loss (DAL) to enforce precise pixel-level supervision.<n>We construct a new dataset, HP-Image-40K, with samples curated from self-synthesis data and processed with automatic filtering.
- Score: 61.3479037455798
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human-product images, which showcase the integration of humans and products, play a vital role in advertising, e-commerce, and digital marketing. The essential challenge of generating such images lies in ensuring the high-fidelity preservation of product details. Among existing paradigms, reference-based inpainting offers a targeted solution by leveraging product reference images to guide the inpainting process. However, limitations remain in three key aspects: the lack of diverse large-scale training data, the struggle of current models to focus on product detail preservation, and the inability of coarse supervision for achieving precise guidance. To address these issues, we propose HiFi-Inpaint, a novel high-fidelity reference-based inpainting framework tailored for generating human-product images. HiFi-Inpaint introduces Shared Enhancement Attention (SEA) to refine fine-grained product features and Detail-Aware Loss (DAL) to enforce precise pixel-level supervision using high-frequency maps. Additionally, we construct a new dataset, HP-Image-40K, with samples curated from self-synthesis data and processed with automatic filtering. Experimental results show that HiFi-Inpaint achieves state-of-the-art performance, delivering detail-preserving human-product images.
Related papers
- FreeInpaint: Tuning-free Prompt Alignment and Visual Rationality Enhancement in Image Inpainting [98.04041133839088]
Text-guided image inpainting endeavors to generate new content within specified regions of images using textual prompts from users.<n>We introduce FreeInpaint, a plug-and-play tuning-free approach that directly optimize the diffusion latents on the fly during inference to improve the faithfulness of the generated images.
arXiv Detail & Related papers (2025-12-24T11:06:26Z) - RefAdGen: High-Fidelity Advertising Image Generation [2.38180456064897]
RefAdGen is a generation framework that achieves high fidelity through a decoupled design.<n>We show that RefAdGen achieves state-of-the-art performance, showcasing robust generalization by maintaining high fidelity and remarkable visual results for both unseen products and challenging real-world, in-the-wild images.
arXiv Detail & Related papers (2025-08-12T18:25:31Z) - DreamPainter: Image Background Inpainting for E-commerce Scenarios [9.12444106077783]
We introduce DreamPainter, a novel framework that incorporates text prompts for control and reference image information as an additional control signal.<n>Our approach significantly outperforms state-of-the-art methods, maintaining high product consistency while effectively integrating both text prompt and reference image information.
arXiv Detail & Related papers (2025-08-04T07:54:37Z) - Preserving Product Fidelity in Large Scale Image Recontextualization with Diffusion Models [1.8606057023042066]
We present a framework for high-fidelity product image recontextualization using text-to-image diffusion models and a novel data augmentation pipeline.<n>Our method improves the quality and diversity of generated images by disentangling product representations and enhancing the model's understanding of product characteristics.
arXiv Detail & Related papers (2025-03-11T01:24:39Z) - An Evaluation Framework for Product Images Background Inpainting based on Human Feedback and Product Consistency [4.177224329586615]
In product advertising applications, the automated inpainting of backgrounds utilizing AI techniques in product images has emerged as a significant task.<n>Human Feedback and Product Consistency (HFPC) can automatically assess the generated product images based on two modules.<n> HFPC achieves state-of-the-art(96.4% in precision) in comparison to other open-source visual-quality-assessment models.
arXiv Detail & Related papers (2024-12-23T12:03:35Z) - Consistent Human Image and Video Generation with Spatially Conditioned Diffusion [82.4097906779699]
Consistent human-centric image and video synthesis aims to generate images with new poses while preserving appearance consistency with a given reference image.<n>We frame the task as a spatially-conditioned inpainting problem, where the target image is in-painted to maintain appearance consistency with the reference.<n>This approach enables the reference features to guide the generation of pose-compliant targets within a unified denoising network.
arXiv Detail & Related papers (2024-12-19T05:02:30Z) - FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process [120.91393949012014]
FreeEnhance is a framework for content-consistent image enhancement using off-the-shelf image diffusion models.
In the noising stage, FreeEnhance is devised to add lighter noise to the region with higher frequency to preserve the high-frequent patterns in the original image.
In the denoising stage, we present three target properties as constraints to regularize the predicted noise, enhancing images with high acutance and high visual quality.
arXiv Detail & Related papers (2024-09-11T17:58:50Z) - Automated Virtual Product Placement and Assessment in Images using Diffusion Models [1.63075356372232]
This paper introduces a novel three-stage fully automated VPP system.
In the first stage, a language-guided image segmentation model identifies optimal regions within images for product inpainting.
In the second stage, Stable Diffusion (SD), fine-tuned with a few example product images, is used to inpaint the product into the previously identified candidate regions.
The final stage introduces an "Alignment Module", which is designed to effectively sieve out low-quality images.
arXiv Detail & Related papers (2024-05-02T09:44:13Z) - RefFusion: Reference Adapted Diffusion Models for 3D Scene Inpainting [63.567363455092234]
RefFusion is a novel 3D inpainting method based on a multi-scale personalization of an image inpainting diffusion model to the given reference view.
Our framework achieves state-of-the-art results for object removal while maintaining high controllability.
arXiv Detail & Related papers (2024-04-16T17:50:02Z) - BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed
Dual-Branch Diffusion [61.90969199199739]
BrushNet is a novel plug-and-play dual-branch model engineered to embed pixel-level masked image features into any pre-trained DM.
BrushNet's superior performance over existing models across seven key metrics, including image quality, mask region preservation, and textual coherence.
arXiv Detail & Related papers (2024-03-11T17:59:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.