Sissi: Zero-shot Style-guided Image Synthesis via Semantic-style Integration
- URL: http://arxiv.org/abs/2601.06605v1
- Date: Sat, 10 Jan 2026 16:01:14 GMT
- Title: Sissi: Zero-shot Style-guided Image Synthesis via Semantic-style Integration
- Authors: Yingying Deng, Xiangyu He, Fan Tang, Weiming Dong, Xucheng Yin,
- Abstract summary: We introduce a training-free framework that reformulates style-guided synthesis as an in-context learning task.<n>We propose a Dynamic Semantic-Style Integration (DSSI) mechanism that reweights attention between semantic and style visual tokens.<n>Experiments show that our approach achieves high-fidelity stylization with superior semantic-style balance and visual quality.
- Score: 57.02757226679549
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Text-guided image generation has advanced rapidly with large-scale diffusion models, yet achieving precise stylization with visual exemplars remains difficult. Existing approaches often depend on task-specific retraining or expensive inversion procedures, which can compromise content integrity, reduce style fidelity, and lead to an unsatisfactory trade-off between semantic prompt adherence and style alignment. In this work, we introduce a training-free framework that reformulates style-guided synthesis as an in-context learning task. Guided by textual semantic prompts, our method concatenates a reference style image with a masked target image, leveraging a pretrained ReFlow-based inpainting model to seamlessly integrate semantic content with the desired style through multimodal attention fusion. We further analyze the imbalance and noise sensitivity inherent in multimodal attention fusion and propose a Dynamic Semantic-Style Integration (DSSI) mechanism that reweights attention between textual semantic and style visual tokens, effectively resolving guidance conflicts and enhancing output coherence. Experiments show that our approach achieves high-fidelity stylization with superior semantic-style balance and visual quality, offering a simple yet powerful alternative to complex, artifact-prone prior methods.
Related papers
- Inversion-Free Style Transfer with Dual Rectified Flows [57.02757226679549]
We propose a novel textitinversion-free style transfer framework based on dual rectified flows.<n>Our approach predicts content and style trajectories in parallel, then fuses them through a dynamic midpoint.<n>Experiments demonstrate generalization across diverse styles and content, providing an effective and efficient pipeline for style transfer.
arXiv Detail & Related papers (2025-11-26T02:28:51Z) - A Training-Free Style-Personalization via Scale-wise Autoregressive Model [11.918925320254534]
We present a training-free framework for style-personalized image generation that controls content and style information during inference.<n>Our method employs a three-path design--content, style, and generation--each guided by a corresponding text prompt.
arXiv Detail & Related papers (2025-07-06T17:42:11Z) - FreeGraftor: Training-Free Cross-Image Feature Grafting for Subject-Driven Text-to-Image Generation [25.925198876189057]
FreeGraftor is a training-free framework that addresses limitations through cross-image feature grafting.<n>Our framework can seamlessly extend to multi-subject generation, making it practical for real-world deployment.
arXiv Detail & Related papers (2025-04-22T14:55:23Z) - AttenST: A Training-Free Attention-Driven Style Transfer Framework with Pre-Trained Diffusion Models [4.364797586362505]
AttenST is a training-free attention-driven style transfer framework.<n>We propose a style-guided self-attention mechanism that conditions self-attention on the reference style.<n>We also introduce a dual-feature cross-attention mechanism to fuse content and style features.
arXiv Detail & Related papers (2025-03-10T13:28:36Z) - ZePo: Zero-Shot Portrait Stylization with Faster Sampling [61.14140480095604]
This paper presents an inversion-free portrait stylization framework based on diffusion models that accomplishes content and style feature fusion in merely four sampling steps.
We propose a feature merging strategy to amalgamate redundant features in Consistency Features, thereby reducing the computational load of attention control.
arXiv Detail & Related papers (2024-08-10T08:53:41Z) - ArtWeaver: Advanced Dynamic Style Integration via Diffusion Model [73.95608242322949]
Stylized Text-to-Image Generation (STIG) aims to generate images from text prompts and style reference images.
We present ArtWeaver, a novel framework that leverages pretrained Stable Diffusion to address challenges such as misinterpreted styles and inconsistent semantics.
arXiv Detail & Related papers (2024-05-24T07:19:40Z) - Style Aligned Image Generation via Shared Attention [61.121465570763085]
We introduce StyleAligned, a technique designed to establish style alignment among a series of generated images.
By employing minimal attention sharing' during the diffusion process, our method maintains style consistency across images within T2I models.
Our method's evaluation across diverse styles and text prompts demonstrates high-quality and fidelity.
arXiv Detail & Related papers (2023-12-04T18:55:35Z) - DiffStyler: Controllable Dual Diffusion for Text-Driven Image
Stylization [66.42741426640633]
DiffStyler is a dual diffusion processing architecture to control the balance between the content and style of diffused results.
We propose a content image-based learnable noise on which the reverse denoising process is based, enabling the stylization results to better preserve the structure information of the content image.
arXiv Detail & Related papers (2022-11-19T12:30:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.