Related papers: ZePo: Zero-Shot Portrait Stylization with Faster Sampling

ZePo: Zero-Shot Portrait Stylization with Faster Sampling

URL: http://arxiv.org/abs/2408.05492v1
Date: Sat, 10 Aug 2024 08:53:41 GMT
Title: ZePo: Zero-Shot Portrait Stylization with Faster Sampling
Authors: Jin Liu, Huaibo Huang, Jie Cao, Ran He,
Abstract summary: This paper presents an inversion-free portrait stylization framework based on diffusion models that accomplishes content and style feature fusion in merely four sampling steps. We propose a feature merging strategy to amalgamate redundant features in Consistency Features, thereby reducing the computational load of attention control.
Score: 61.14140480095604
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Diffusion-based text-to-image generation models have significantly advanced the field of art content synthesis. However, current portrait stylization methods generally require either model fine-tuning based on examples or the employment of DDIM Inversion to revert images to noise space, both of which substantially decelerate the image generation process. To overcome these limitations, this paper presents an inversion-free portrait stylization framework based on diffusion models that accomplishes content and style feature fusion in merely four sampling steps. We observed that Latent Consistency Models employing consistency distillation can effectively extract representative Consistency Features from noisy images. To blend the Consistency Features extracted from both content and style images, we introduce a Style Enhancement Attention Control technique that meticulously merges content and style features within the attention space of the target image. Moreover, we propose a feature merging strategy to amalgamate redundant features in Consistency Features, thereby reducing the computational load of attention control. Extensive experiments have validated the effectiveness of our proposed framework in enhancing stylization efficiency and fidelity. The code is available at \url{https://github.com/liujin112/ZePo}.

Related papers

Balanced Image Stylization with Style Matching Score [36.542802101359705]
Style Matching Score (SMS) is a novel optimization method for image stylization with diffusion models. SMS balances style alignment and content preservation, outperforming state-of-the-art approaches.
arXiv Detail & Related papers (2025-03-10T17:58:02Z)
AttenST: A Training-Free Attention-Driven Style Transfer Framework with Pre-Trained Diffusion Models [4.364797586362505]
AttenST is a training-free attention-driven style transfer framework. We propose a style-guided self-attention mechanism that conditions self-attention on the reference style. We also introduce a dual-feature cross-attention mechanism to fuse content and style features.
arXiv Detail & Related papers (2025-03-10T13:28:36Z)
Consistent Human Image and Video Generation with Spatially Conditioned Diffusion [82.4097906779699]
Consistent human-centric image and video synthesis aims to generate images with new poses while preserving appearance consistency with a given reference image. We frame the task as a spatially-conditioned inpainting problem, where the target image is in-painted to maintain appearance consistency with the reference. This approach enables the reference features to guide the generation of pose-compliant targets within a unified denoising network.
arXiv Detail & Related papers (2024-12-19T05:02:30Z)
Z-STAR+: A Zero-shot Style Transfer Method via Adjusting Style Distribution [24.88532732093652]
Style transfer presents a significant challenge, primarily centered on identifying an appropriate style representation. In contrast to existing approaches, we have discovered that latent features in vanilla diffusion models inherently contain natural style and content distributions. Our method adopts dual denoising paths to represent content and style references in latent space, subsequently guiding the content image denoising process with style latent codes.
arXiv Detail & Related papers (2024-11-28T15:56:17Z)
Oscillation Inversion: Understand the structure of Large Flow Model through the Lens of Inversion Method [60.88467353578118]
We show that a fixed-point-inspired iterative approach to invert real-world images does not achieve convergence, instead oscillating between distinct clusters. We introduce a simple and fast distribution transfer technique that facilitates image enhancement, stroke-based recoloring, as well as visual prompt-guided image editing.
arXiv Detail & Related papers (2024-11-17T17:45:37Z)
MagicStyle: Portrait Stylization Based on Reference Image [0.562479170374811]
We propose a diffusion model-based reference image stylization method specifically for portraits, called MagicStyle. The C phase involves a reverse denoising process, where DDIM Inversion is performed separately on the content image and the style image, storing the self-attention query, key and value features of both images during the inversion process. The FFF phase executes forward denoising, integrating the texture and color information from the pre-stored feature queries, keys and values into the diffusion generation process based on our Well-designed Feature Fusion Attention (FFA)
arXiv Detail & Related papers (2024-09-12T15:51:09Z)
ArtWeaver: Advanced Dynamic Style Integration via Diffusion Model [73.95608242322949]
Stylized Text-to-Image Generation (STIG) aims to generate images from text prompts and style reference images. We present ArtWeaver, a novel framework that leverages pretrained Stable Diffusion to address challenges such as misinterpreted styles and inconsistent semantics.
arXiv Detail & Related papers (2024-05-24T07:19:40Z)
InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation [5.364489068722223]
The concept of style is inherently underdetermined, encompassing a multitude of elements such as color, material, atmosphere, design, and structure. Inversion-based methods are prone to style degradation, often resulting in the loss of fine-grained details. adapter-based approaches frequently require meticulous weight tuning for each reference image to achieve a balance between style intensity and text controllability.
arXiv Detail & Related papers (2024-04-03T13:34:09Z)
Style-Extracting Diffusion Models for Semi-Supervised Histopathology Segmentation [6.479933058008389]
Style-Extracting Diffusion Models generate images with unseen characteristics beneficial for downstream tasks. In this work, we show the capability of our method on a natural image dataset as a proof-of-concept. We verify the added value of the generated images by showing improved segmentation results and lower performance variability between patients.
arXiv Detail & Related papers (2024-03-21T14:36:59Z)
StyleInject: Parameter Efficient Tuning of Text-to-Image Diffusion Models [35.732715025002705]
StyleInject is a specialized fine-tuning approach tailored for text-to-image models. It adapts to varying styles by adjusting the variance of visual features based on the characteristics of the input signal. It proves particularly effective in learning from and enhancing a range of advanced, community-fine-tuned generative models.
arXiv Detail & Related papers (2024-01-25T04:53:03Z)
Style Aligned Image Generation via Shared Attention [61.121465570763085]
We introduce StyleAligned, a technique designed to establish style alignment among a series of generated images. By employing minimal attention sharing' during the diffusion process, our method maintains style consistency across images within T2I models. Our method's evaluation across diverse styles and text prompts demonstrates high-quality and fidelity.
arXiv Detail & Related papers (2023-12-04T18:55:35Z)
Layered Rendering Diffusion Model for Zero-Shot Guided Image Synthesis [60.260724486834164]
This paper introduces innovative solutions to enhance spatial controllability in diffusion models reliant on text queries. We present two key innovations: Vision Guidance and the Layered Rendering Diffusion framework. We apply our method to three practical applications: bounding box-to-image, semantic mask-to-image and image editing.
arXiv Detail & Related papers (2023-11-30T10:36:19Z)
Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models [60.63556257324894]
A key desired property of image generative models is the ability to disentangle different attributes. We propose a simple, light-weight image editing algorithm where the mixing weights of the two text embeddings are optimized for style matching and content preservation. Experiments show that the proposed method can modify a wide range of attributes, with the performance outperforming diffusion-model-based image-editing algorithms.
arXiv Detail & Related papers (2022-12-16T19:58:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.