Related papers: E-Commerce Inpainting with Mask Guidance in Controlnet for Reducing Overcompletion

E-Commerce Inpainting with Mask Guidance in Controlnet for Reducing Overcompletion

URL: http://arxiv.org/abs/2409.09681v1
Date: Sun, 15 Sep 2024 10:10:13 GMT
Title: E-Commerce Inpainting with Mask Guidance in Controlnet for Reducing Overcompletion
Authors: Guandong Li,
Abstract summary: This paper systematically analyzes and addresses a core pain point in diffusion model generation: overcompletion. Our method has achieved promising results in practical applications and we hope it can serve as an inspiring technical report in this field.
Score: 13.67619785783182
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: E-commerce image generation has always been one of the core demands in the e-commerce field. The goal is to restore the missing background that matches the main product given. In the post-AIGC era, diffusion models are primarily used to generate product images, achieving impressive results. This paper systematically analyzes and addresses a core pain point in diffusion model generation: overcompletion, which refers to the difficulty in maintaining product features. We propose two solutions: 1. Using an instance mask fine-tuned inpainting model to mitigate this phenomenon; 2. Adopting a train-free mask guidance approach, which incorporates refined product masks as constraints when combining ControlNet and UNet to generate the main product, thereby avoiding overcompletion of the product. Our method has achieved promising results in practical applications and we hope it can serve as an inspiring technical report in this field.

Related papers

DreamPainter: Image Background Inpainting for E-commerce Scenarios [9.12444106077783]
We introduce DreamPainter, a novel framework that incorporates text prompts for control and reference image information as an additional control signal.<n>Our approach significantly outperforms state-of-the-art methods, maintaining high product consistency while effectively integrating both text prompt and reference image information.
arXiv Detail & Related papers (2025-08-04T07:54:37Z)
An Evaluation Framework for Product Images Background Inpainting based on Human Feedback and Product Consistency [4.177224329586615]
In product advertising applications, the automated inpainting of backgrounds utilizing AI techniques in product images has emerged as a significant task. Human Feedback and Product Consistency (HFPC) can automatically assess the generated product images based on two modules. HFPC achieves state-of-the-art(96.4% in precision) in comparison to other open-source visual-quality-assessment models.
arXiv Detail & Related papers (2024-12-23T12:03:35Z)
Layout Control and Semantic Guidance with Attention Loss Backward for T2I Diffusion Model [13.67619785783182]
We propose a train-free method based on attention loss backward, cleverly controlling the cross attention map. Our approach has achieved excellent practical applications in production, and we hope it can serve as an inspiring technical report.
arXiv Detail & Related papers (2024-11-11T03:27:18Z)
OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling [80.85164509232261]
We propose OneRef, a minimalist referring framework built on the modality-shared one-tower transformer. To modeling the referential relationship, we introduce a novel MVLM paradigm called Mask Referring Modeling (MRefM) Within MRefM, we propose a referring-aware dynamic image masking strategy that is aware of the referred region.
arXiv Detail & Related papers (2024-10-10T15:18:19Z)
Spatiotemporal Graph Guided Multi-modal Network for Livestreaming Product Retrieval [32.478352606125306]
We propose a text-guided attention mechanism that leverages the spoken content of salespeople to guide the model to focus toward intended products. A long-rangetemporal graph network is further designed to achieve both instance-level interaction and frame-level matching. We demonstrate the superior performance of our proposed SGMN model, surpassing the state-of-the-art methods by a substantial margin.
arXiv Detail & Related papers (2024-07-23T07:36:54Z)
Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning [116.75939193785143]
Contrastive learning (CL) for Vision Transformers (ViTs) in image domains has achieved performance comparable to CL for traditional convolutional backbones. In 3D point cloud pretraining with ViTs, masked autoencoder (MAE) modeling remains dominant.
arXiv Detail & Related papers (2024-07-08T12:28:56Z)
Patch-enhanced Mask Encoder Prompt Image Generation [0.8747606955991707]
We propose a patch-enhanced mask approach to ensure accurate product descriptions. Our approach consists of three components Patch Flexible Visibility, Mask Prompt Adapter and an image Foundation Model. Experimental results show our method can achieve the highest visual results and FID scores compared with other methods.
arXiv Detail & Related papers (2024-05-29T13:47:32Z)
BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion [61.90969199199739]
BrushNet is a novel plug-and-play dual-branch model engineered to embed pixel-level masked image features into any pre-trained DM. BrushNet's superior performance over existing models across seven key metrics, including image quality, mask region preservation, and textual coherence.
arXiv Detail & Related papers (2024-03-11T17:59:31Z)
Towards Enhanced Image Inpainting: Mitigating Unwanted Object Insertion and Preserving Color Consistency [78.0488707697235]
Post-processing approach dubbed ASUKA (Aligned Stable inpainting with UnKnown Areas prior) to improve inpainting models.<n>Masked Auto-Encoder (MAE) for reconstruction-based priors mitigates object hallucination.<n> specialized VAE decoder that treats latent-to-image decoding as a local task.
arXiv Detail & Related papers (2023-12-08T05:08:06Z)
MetaCloak: Preventing Unauthorized Subject-driven Text-to-image Diffusion-based Synthesis via Meta-learning [59.988458964353754]
Text-to-image diffusion models allow seamless generation of personalized images from scant reference photos. Existing approaches perturb user images in imperceptible way to render them "unlearnable" from malicious uses. We propose MetaCloak, which solves the bi-level poisoning problem with a meta-learning framework.
arXiv Detail & Related papers (2023-11-22T03:31:31Z)
GAN-based Algorithm for Efficient Image Inpainting [0.0]
Global pandemic has post challenges in a new dimension on facial recognition, where people start to wear masks. Under such condition, the authors consider utilizing machine learning in image inpainting to tackle the problem. In particular, autoencoder has great potential on retaining important, general features of the image.
arXiv Detail & Related papers (2023-09-13T20:28:54Z)
Image Inpainting with Edge-guided Learnable Bidirectional Attention Maps [85.67745220834718]
We present an edge-guided learnable bidirectional attention map (Edge-LBAM) for improving image inpainting of irregular holes. Our Edge-LBAM method contains dual procedures,including structure-aware mask-updating guided by predict edges. Extensive experiments show that our Edge-LBAM is effective in generating coherent image structures and preventing color discrepancy and blurriness.
arXiv Detail & Related papers (2021-04-25T07:25:16Z)
Autoencoding Generative Adversarial Networks [0.0]
I propose a four-network model which learns a mapping between a specified latent space and a given sample space. The AEGAN technique offers several improvements to typical GAN training, including training stabilization, mode-collapse prevention, and permitting the directversa between real samples.
arXiv Detail & Related papers (2020-04-11T19:51:04Z)
Object-Centric Image Generation from Layouts [93.10217725729468]
We develop a layout-to-image-generation method to generate complex scenes with multiple objects. Our method learns representations of the spatial relationships between objects in the scene, which lead to our model's improved layout-fidelity. We introduce SceneFID, an object-centric adaptation of the popular Fr'echet Inception Distance metric, that is better suited for multi-object images.
arXiv Detail & Related papers (2020-03-16T21:40:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.