Style-Friendly SNR Sampler for Style-Driven Generation
- URL: http://arxiv.org/abs/2411.14793v3
- Date: Thu, 20 Mar 2025 05:25:16 GMT
- Title: Style-Friendly SNR Sampler for Style-Driven Generation
- Authors: Jooyoung Choi, Chaehun Shin, Yeongtak Oh, Heeseung Kim, Jungbeom Lee, Sungroh Yoon,
- Abstract summary: In style-driven generation, users typically supply reference images exemplifying the desired style, together with text prompts that specify desired stylistic attributes.<n>Previous approaches popularly rely on fine-tuning, yet it often blindly utilizes objectives and noise level distributions from pre-training without adaptation.<n>We propose the Style-friendly SNR sampler, which aggressively shifts the signal-to-noise ratio (SNR) distribution toward higher noise levels during fine-tuning to focus on noise levels where stylistic features emerge.
- Score: 44.49588713955722
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recent text-to-image diffusion models generate high-quality images but struggle to learn new, personalized styles, which limits the creation of unique style templates. In style-driven generation, users typically supply reference images exemplifying the desired style, together with text prompts that specify desired stylistic attributes. Previous approaches popularly rely on fine-tuning, yet it often blindly utilizes objectives and noise level distributions from pre-training without adaptation. We discover that stylistic features predominantly emerge at higher noise levels, leading current fine-tuning methods to exhibit suboptimal style alignment. We propose the Style-friendly SNR sampler, which aggressively shifts the signal-to-noise ratio (SNR) distribution toward higher noise levels during fine-tuning to focus on noise levels where stylistic features emerge. This enhances models' ability to capture novel styles indicated by reference images and text prompts. We demonstrate improved generation of novel styles that cannot be adequately described solely with a text prompt, enabling the creation of new style templates for personalized content creation.
Related papers
- Balanced Image Stylization with Style Matching Score [36.542802101359705]
Style Matching Score (SMS) is a novel optimization method for image stylization with diffusion models.
SMS balances style alignment and content preservation, outperforming state-of-the-art approaches.
arXiv Detail & Related papers (2025-03-10T17:58:02Z) - SVP: Style-Enhanced Vivid Portrait Talking Head Diffusion Model [66.34929233269409]
Talking Head Generation (THG) is an important task with broad application prospects in various fields such as digital humans, film production, and virtual reality.
We propose a novel framework named Style-Enhanced Vivid Portrait (SVP) which fully leverages style-related information in THG.
Our model generates diverse, vivid, and high-quality videos with flexible control over intrinsic styles, outperforming existing state-of-the-art methods.
arXiv Detail & Related papers (2024-09-05T06:27:32Z) - ZePo: Zero-Shot Portrait Stylization with Faster Sampling [61.14140480095604]
This paper presents an inversion-free portrait stylization framework based on diffusion models that accomplishes content and style feature fusion in merely four sampling steps.
We propose a feature merging strategy to amalgamate redundant features in Consistency Features, thereby reducing the computational load of attention control.
arXiv Detail & Related papers (2024-08-10T08:53:41Z) - Ada-adapter:Fast Few-shot Style Personlization of Diffusion Model with Pre-trained Image Encoder [57.574544285878794]
Ada-Adapter is a novel framework for few-shot style personalization of diffusion models.
Our method enables efficient zero-shot style transfer utilizing a single reference image.
We demonstrate the effectiveness of our approach on various artistic styles, including flat art, 3D rendering, and logo design.
arXiv Detail & Related papers (2024-07-08T02:00:17Z) - ArtWeaver: Advanced Dynamic Style Integration via Diffusion Model [73.95608242322949]
Stylized Text-to-Image Generation (STIG) aims to generate images from text prompts and style reference images.
We present ArtWeaver, a novel framework that leverages pretrained Stable Diffusion to address challenges such as misinterpreted styles and inconsistent semantics.
arXiv Detail & Related papers (2024-05-24T07:19:40Z) - Visual Style Prompting with Swapping Self-Attention [26.511518230332758]
We propose a novel approach to produce a diverse range of images while maintaining specific style elements and nuances.
During the denoising process, we keep the query from original features while swapping the key and value with those from reference features in the late self-attention layers.
Our method demonstrates superiority over existing approaches, best reflecting the style of the references and ensuring that resulting images match the text prompts most accurately.
arXiv Detail & Related papers (2024-02-20T12:51:17Z) - DreamDrone: Text-to-Image Diffusion Models are Zero-shot Perpetual View Generators [56.994967294931286]
We introduce DreamDrone, a novel zero-shot and training-free pipeline for generating flythrough scenes from textual prompts.
We advocate explicitly warping the intermediate latent code of the pre-trained text-to-image diffusion model for high-quality image generation and unbounded generalization ability.
arXiv Detail & Related papers (2023-12-14T08:42:26Z) - Style Aligned Image Generation via Shared Attention [61.121465570763085]
We introduce StyleAligned, a technique designed to establish style alignment among a series of generated images.
By employing minimal attention sharing' during the diffusion process, our method maintains style consistency across images within T2I models.
Our method's evaluation across diverse styles and text prompts demonstrates high-quality and fidelity.
arXiv Detail & Related papers (2023-12-04T18:55:35Z) - InstaStyle: Inversion Noise of a Stylized Image is Secretly a Style Adviser [19.466860144772674]
In this paper, we propose InstaStyle, a novel approach that excels in generating high-fidelity stylized images with only a single reference image.
Our approach is based on the finding that the inversion noise from a stylized reference image inherently carries the style signal.
We introduce a learnable style token via prompt refinement, which enhances the accuracy of the style description for the reference image.
arXiv Detail & Related papers (2023-11-25T14:38:54Z) - ControlStyle: Text-Driven Stylized Image Generation Using Diffusion
Priors [105.37795139586075]
We propose a new task for stylizing'' text-to-image models, namely text-driven stylized image generation.
We present a new diffusion model (ControlStyle) via upgrading a pre-trained text-to-image model with a trainable modulation network.
Experiments demonstrate the effectiveness of our ControlStyle in producing more visually pleasing and artistic results.
arXiv Detail & Related papers (2023-11-09T15:50:52Z) - Customize StyleGAN with One Hand Sketch [0.0]
We propose a framework to control StyleGAN imagery with a single user sketch.
We learn a conditional distribution in the latent space of a pre-trained StyleGAN model via energy-based learning.
Our model can generate multi-modal images semantically aligned with the input sketch.
arXiv Detail & Related papers (2023-10-29T09:32:33Z) - StyleAdapter: A Unified Stylized Image Generation Model [97.24936247688824]
StyleAdapter is a unified stylized image generation model capable of producing a variety of stylized images.
It can be integrated with existing controllable synthesis methods, such as T2I-adapter and ControlNet.
arXiv Detail & Related papers (2023-09-04T19:16:46Z) - Arbitrary Style Guidance for Enhanced Diffusion-Based Text-to-Image
Generation [13.894251782142584]
Diffusion-based text-to-image generation models like GLIDE and DALLE-2 have gained wide success recently.
We propose a novel style guidance method to support generating images using arbitrary style guided by a reference image.
arXiv Detail & Related papers (2022-11-14T20:52:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.