Training-Free Style Consistent Image Synthesis with Condition and Mask Guidance in E-Commerce
- URL: http://arxiv.org/abs/2409.04750v1
- Date: Sat, 7 Sep 2024 07:50:13 GMT
- Title: Training-Free Style Consistent Image Synthesis with Condition and Mask Guidance in E-Commerce
- Authors: Guandong Li,
- Abstract summary: We introduce the concept of the QKV level, referring to modifications in the attention maps (self-attention and cross-attention) when integrating UNet with image conditions.
We use shared KV to enhance similarity in cross-attention and generate mask guidance from the attention map to cleverly direct the generation of style-consistent images.
- Score: 13.67619785783182
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generating style-consistent images is a common task in the e-commerce field, and current methods are largely based on diffusion models, which have achieved excellent results. This paper introduces the concept of the QKV (query/key/value) level, referring to modifications in the attention maps (self-attention and cross-attention) when integrating UNet with image conditions. Without disrupting the product's main composition in e-commerce images, we aim to use a train-free method guided by pre-set conditions. This involves using shared KV to enhance similarity in cross-attention and generating mask guidance from the attention map to cleverly direct the generation of style-consistent images. Our method has shown promising results in practical applications.
Related papers
- Edicho: Consistent Image Editing in the Wild [90.42395533938915]
Edicho steps in with a training-free solution based on diffusion models.
It features a fundamental design principle of using explicit image correspondence to direct editing.
arXiv Detail & Related papers (2024-12-30T16:56:44Z) - VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control [8.685610154314459]
diffusion models show extraordinary talents in text-to-image generation, but they may still fail to generate highly aesthetic images.
We propose Cross-Attention Value Mixing Control (VMix) Adapter, a plug-and-play aesthetics adapter.
Our key insight is to enhance the aesthetic presentation of existing diffusion models by designing a superior condition control method.
arXiv Detail & Related papers (2024-12-30T08:47:25Z) - Enhancing Conditional Image Generation with Explainable Latent Space Manipulation [0.0]
This paper proposes a novel approach to achieve fidelity to a reference image while adhering to conditional prompts.
We analyze the cross attention maps of the cross attention layers and gradients for the denoised latent vector.
Using this information, we create masks at specific timesteps during denoising to preserve subjects while seamlessly integrating the reference image features.
arXiv Detail & Related papers (2024-08-29T03:12:04Z) - ZePo: Zero-Shot Portrait Stylization with Faster Sampling [61.14140480095604]
This paper presents an inversion-free portrait stylization framework based on diffusion models that accomplishes content and style feature fusion in merely four sampling steps.
We propose a feature merging strategy to amalgamate redundant features in Consistency Features, thereby reducing the computational load of attention control.
arXiv Detail & Related papers (2024-08-10T08:53:41Z) - CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition [73.51329037954866]
We propose a robust global representation method with cross-image correlation awareness for visual place recognition.
Our method uses the attention mechanism to correlate multiple images within a batch.
Our method outperforms state-of-the-art methods by a large margin with significantly less training time.
arXiv Detail & Related papers (2024-02-29T15:05:11Z) - Visual Concept-driven Image Generation with Text-to-Image Diffusion Model [65.96212844602866]
Text-to-image (TTI) models have demonstrated impressive results in generating high-resolution images of complex scenes.
Recent approaches have extended these methods with personalization techniques that allow them to integrate user-illustrated concepts.
However, the ability to generate images with multiple interacting concepts, such as human subjects, as well as concepts that may be entangled in one, or across multiple, image illustrations remains illusive.
We propose a concept-driven TTI personalization framework that addresses these core challenges.
arXiv Detail & Related papers (2024-02-18T07:28:37Z) - Cross-Image Attention for Zero-Shot Appearance Transfer [68.43651329067393]
We introduce a cross-image attention mechanism that implicitly establishes semantic correspondences across images.
We harness three mechanisms that either manipulate the noisy latent codes or the model's internal representations throughout the denoising process.
Experiments show that our method is effective across a wide range of object categories and is robust to variations in shape, size, and viewpoint.
arXiv Detail & Related papers (2023-11-06T18:33:24Z) - A Unified Arbitrary Style Transfer Framework via Adaptive Contrastive
Learning [84.8813842101747]
Unified Contrastive Arbitrary Style Transfer (UCAST) is a novel style representation learning and transfer framework.
We present an adaptive contrastive learning scheme for style transfer by introducing an input-dependent temperature.
Our framework consists of three key components, i.e., a parallel contrastive learning scheme for style representation and style transfer, a domain enhancement module for effective learning of style distribution, and a generative network for style transfer.
arXiv Detail & Related papers (2023-03-09T04:35:00Z) - Collaborative Image Understanding [5.5174379874002435]
We show that collaborative information can be leveraged to improve the classification process of new images.
A series of experiments on datasets from e-commerce and social media demonstrates that considering collaborative signals helps to significantly improve the performance of the main task of image classification by up to 9.1%.
arXiv Detail & Related papers (2022-10-21T12:13:08Z) - SCS-Co: Self-Consistent Style Contrastive Learning for Image
Harmonization [29.600429707123645]
We propose a self-consistent style contrastive learning scheme (SCS-Co) for image harmonization.
By dynamically generating multiple negative samples, our SCS-Co can learn more distortion knowledge and well regularize the generated harmonized image.
In addition, we propose a background-attentional adaptive instance normalization (BAIN) to achieve an attention-weighted background feature distribution.
arXiv Detail & Related papers (2022-04-29T09:22:01Z) - Co-Attention for Conditioned Image Matching [91.43244337264454]
We propose a new approach to determine correspondences between image pairs in the wild under large changes in illumination, viewpoint, context, and material.
While other approaches find correspondences between pairs of images by treating the images independently, we instead condition on both images to implicitly take account of the differences between them.
arXiv Detail & Related papers (2020-07-16T17:32:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.