StyMam: A Mamba-Based Generator for Artistic Style Transfer
- URL: http://arxiv.org/abs/2601.12954v3
- Date: Sun, 25 Jan 2026 07:22:46 GMT
- Title: StyMam: A Mamba-Based Generator for Artistic Style Transfer
- Authors: Zhou Hong, Ning Dong, Yicheng Di, Xiaolong Xu, Rongsheng Hu, Yihua Shao, Run Ling, Yun Wang, Juqin Wang, Zhanjie Zhang, Ao Ma,
- Abstract summary: We propose a mamba-based generator to produce high-quality stylized images without introducing artifacts and disharmonious patterns.<n>Specifically, we introduce a mamba-based generator with a residual dual-path strip scanning mechanism and a channel-reweighted spatial attention module.<n>The proposed method outperforms state-of-the-art algorithms in both quality and speed.
- Score: 16.81948748572056
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image style transfer aims to integrate the visual patterns of a specific artistic style into a content image while preserving its content structure. Existing methods mainly rely on the generative adversarial network (GAN) or stable diffusion (SD). GAN-based approaches using CNNs or Transformers struggle to jointly capture local and global dependencies, leading to artifacts and disharmonious patterns. SD-based methods reduce such issues but often fail to preserve content structures and suffer from slow inference. To address these issues, we revisit GAN and propose a mamba-based generator, termed as StyMam, to produce high-quality stylized images without introducing artifacts and disharmonious patterns. Specifically, we introduce a mamba-based generator with a residual dual-path strip scanning mechanism and a channel-reweighted spatial attention module. The former efficiently captures local texture features, while the latter models global dependencies. Finally, extensive qualitative and quantitative experiments demonstrate that the proposed method outperforms state-of-the-art algorithms in both quality and speed.
Related papers
- Learning Deblurring Texture Prior from Unpaired Data with Diffusion Model [92.61216319417208]
We propose a novel diffusion model (DM)-based framework, dubbed ours, for image deblurring.<n>ours performs DM to generate the prior knowledge that aids in recovering the textures of blurry images.<n>To fully exploit the generated texture priors, we present the Texture Transfer Transformer layer (TTformer)
arXiv Detail & Related papers (2025-07-18T01:50:31Z) - CDG-MAE: Learning Correspondences from Diffusion Generated Views [19.24402848656637]
CDG-MAE is a novel MAE-based self-supervised method that uses diverse synthetic views generated from static images.<n>These generated views exhibit substantial changes in pose and perspective, providing a rich training signal.
arXiv Detail & Related papers (2025-06-22T20:40:11Z) - High-Fidelity Image Inpainting with Multimodal Guided GAN Inversion [15.202130790708747]
We propose a novel GAN inversion approach, dubbed MMInvertFill, for image inpainting.<n> MMInvertFill contains primarily a multimodal guided encoder with a pre-modulation and a GAN generator with F&W+ latent space.<n>We show that our MMInvertFill qualitatively and quantitatively outperforms other state-of-the-arts.
arXiv Detail & Related papers (2025-04-17T10:58:45Z) - SaMam: Style-aware State Space Model for Arbitrary Image Style Transfer [41.09041735653436]
We develop a Mamba-based style transfer framework, termed SaMam.<n>Specifically, a mamba encoder is designed to efficiently extract content and style information.<n>To address the problems of local pixel forgetting, channel redundancy and spatial discontinuity of existing SSMs, we introduce both local enhancement and zigzag scan.
arXiv Detail & Related papers (2025-03-20T08:18:27Z) - SEM-Net: Efficient Pixel Modelling for image inpainting with Spatially Enhanced SSM [11.447968918063335]
Image inpainting aims to repair a partially damaged image based on the information from known regions of the images.
SEM-Net is a novel visual State Space model (SSM) vision network, modelling corrupted images at the pixel level while capturing long-range dependencies (LRDs) in state space.
arXiv Detail & Related papers (2024-11-10T00:35:14Z) - UniVST: A Unified Framework for Training-free Localized Video Style Transfer [102.52552893495475]
This paper presents UniVST, a unified framework for localized video style transfer based on diffusion models.<n>It operates without the need for training, offering a distinct advantage over existing diffusion methods that transfer style across entire videos.
arXiv Detail & Related papers (2024-10-26T05:28:02Z) - ZePo: Zero-Shot Portrait Stylization with Faster Sampling [61.14140480095604]
This paper presents an inversion-free portrait stylization framework based on diffusion models that accomplishes content and style feature fusion in merely four sampling steps.
We propose a feature merging strategy to amalgamate redundant features in Consistency Features, thereby reducing the computational load of attention control.
arXiv Detail & Related papers (2024-08-10T08:53:41Z) - Coherent and Multi-modality Image Inpainting via Latent Space Optimization [61.99406669027195]
PILOT (intextbfPainting vtextbfIa textbfLatent textbfOptextbfTimization) is an optimization approach grounded on a novel textitsemantic centralization and textitbackground preservation loss.
Our method searches latent spaces capable of generating inpainted regions that exhibit high fidelity to user-provided prompts while maintaining coherence with the background.
arXiv Detail & Related papers (2024-07-10T19:58:04Z) - OneActor: Consistent Character Generation via Cluster-Conditioned Guidance [29.426558840522734]
We propose a novel one-shot tuning paradigm, termed OneActor.
It efficiently performs consistent subject generation solely driven by prompts.
Our method is capable of multi-subject generation and compatible with popular diffusion extensions.
arXiv Detail & Related papers (2024-04-16T03:45:45Z) - Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components.
CNNs are used to augment the local texture information of coarse priors.
DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z) - Self-Distilled StyleGAN: Towards Generation from Internet Photos [47.28014076401117]
We show how StyleGAN can be adapted to work on raw uncurated images collected from the Internet.
We propose a StyleGAN-based self-distillation approach, which consists of two main components.
The presented technique enables the generation of high-quality images, while minimizing the loss in diversity of the data.
arXiv Detail & Related papers (2022-02-24T17:16:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.