SCSA: A Plug-and-Play Semantic Continuous-Sparse Attention for Arbitrary Semantic Style Transfer
- URL: http://arxiv.org/abs/2503.04119v1
- Date: Thu, 06 Mar 2025 05:56:25 GMT
- Title: SCSA: A Plug-and-Play Semantic Continuous-Sparse Attention for Arbitrary Semantic Style Transfer
- Authors: Chunnan Shang, Zhizhong Wang, Hongwei Wang, Xiangming Meng,
- Abstract summary: We argue that the root cause lies in their failure to consider the relationship between local regions and semantic regions.<n>We propose a plug-and-play semantic continuous-sparse attention, dubbed SCSA, for arbitrary semantic style transfer.
- Score: 14.583909336113566
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Attention-based arbitrary style transfer methods, including CNN-based, Transformer-based, and Diffusion-based, have flourished and produced high-quality stylized images. However, they perform poorly on the content and style images with the same semantics, i.e., the style of the corresponding semantic region of the generated stylized image is inconsistent with that of the style image. We argue that the root cause lies in their failure to consider the relationship between local regions and semantic regions. To address this issue, we propose a plug-and-play semantic continuous-sparse attention, dubbed SCSA, for arbitrary semantic style transfer -- each query point considers certain key points in the corresponding semantic region. Specifically, semantic continuous attention ensures each query point fully attends to all the continuous key points in the same semantic region that reflect the overall style characteristics of that region; Semantic sparse attention allows each query point to focus on the most similar sparse key point in the same semantic region that exhibits the specific stylistic texture of that region. By combining the two modules, the resulting SCSA aligns the overall style of the corresponding semantic regions while transferring the vivid textures of these regions. Qualitative and quantitative results prove that SCSA enables attention-based arbitrary style transfer methods to produce high-quality semantic stylized images.
Related papers
- Exploring Semantic Consistency and Style Diversity for Domain Generalized Semantic Segmentation [4.850207292777464]
Domain Generalized Semantic aims to enhance the generalization of semantic segmentation across unknown target domains.<n>We introduce SCSD for Semantic Consistency prediction and Style Diversity generalization.<n>SCSD significantly outperforms existing state-of-theart methods.
arXiv Detail & Related papers (2024-12-16T18:20:06Z) - Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance [17.29693696084235]
We present a novel approach to customize the guidance degrees for different semantic units in text-to-image diffusion models.
We adaptively adjust the CFG scales across different semantic regions to rescale the text guidance degrees into a uniform level.
Experiments demonstrate the superiority of S-CFG over the original CFG strategy on various text-to-image diffusion models.
arXiv Detail & Related papers (2024-04-08T10:45:29Z) - Progressive Feature Self-reinforcement for Weakly Supervised Semantic
Segmentation [55.69128107473125]
We propose a single-stage approach for Weakly Supervised Semantic (WSSS) with image-level labels.
We adaptively partition the image content into deterministic regions (e.g., confident foreground and background) and uncertain regions (e.g., object boundaries and misclassified categories) for separate processing.
Building upon this, we introduce a complementary self-enhancement method that constrains the semantic consistency between these confident regions and an augmented image with the same class labels.
arXiv Detail & Related papers (2023-12-14T13:21:52Z) - All-to-key Attention for Arbitrary Style Transfer [98.83954812536521]
We propose a novel all-to-key attention mechanism -- each position of content features is matched to stable key positions of style features.
The resultant module, dubbed StyA2K, shows extraordinary performance in preserving the semantic structure and rendering consistent style patterns.
arXiv Detail & Related papers (2022-12-08T06:46:35Z) - Consistent Style Transfer [23.193302706359464]
Recently, attentional arbitrary style transfer methods have been proposed to achieve fine-grained results.
We propose the progressive attentional manifold alignment (PAMA) to alleviate this problem.
We show that PAMA achieves state-of-the-art performance while avoiding the inconsistency of semantic regions.
arXiv Detail & Related papers (2022-01-06T20:19:35Z) - Spatial and Semantic Consistency Regularizations for Pedestrian
Attribute Recognition [50.932864767867365]
We propose a framework that consists of two complementary regularizations to achieve spatial and semantic consistency for each attribute.
Based on the precise attribute locations, we propose a semantic consistency regularization to extract intrinsic and discriminative semantic features.
Results show that the proposed method performs favorably against state-of-the-art methods without increasing parameters.
arXiv Detail & Related papers (2021-09-13T03:36:44Z) - Discriminative Region-based Multi-Label Zero-Shot Learning [145.0952336375342]
Multi-label zero-shot learning (ZSL) is a more realistic counter-part of standard single-label ZSL.
We propose an alternate approach towards region-based discriminability-preserving ZSL.
arXiv Detail & Related papers (2021-08-20T17:56:47Z) - Affinity Space Adaptation for Semantic Segmentation Across Domains [57.31113934195595]
In this paper, we address the problem of unsupervised domain adaptation (UDA) in semantic segmentation.
Motivated by the fact that source and target domain have invariant semantic structures, we propose to exploit such invariance across domains.
We develop two affinity space adaptation strategies: affinity space cleaning and adversarial affinity space alignment.
arXiv Detail & Related papers (2020-09-26T10:28:11Z) - Cross-domain Correspondence Learning for Exemplar-based Image
Translation [59.35767271091425]
We present a framework for exemplar-based image translation, which synthesizes a photo-realistic image from the input in a distinct domain.
The output has the style (e.g., color, texture) in consistency with the semantically corresponding objects in the exemplar.
We show that our method is superior to state-of-the-art methods in terms of image quality significantly.
arXiv Detail & Related papers (2020-04-12T09:10:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.