Related papers: Structure-Level Disentangled Diffusion for Few-Shot Chinese Font Generation

Structure-Level Disentangled Diffusion for Few-Shot Chinese Font Generation

URL: http://arxiv.org/abs/2602.18874v1
Date: Sat, 21 Feb 2026 15:41:06 GMT
Title: Structure-Level Disentangled Diffusion for Few-Shot Chinese Font Generation
Authors: Jie Li, Suorong Yang, Jian Zhao, Furao Shen,
Abstract summary: Few-shot Chinese font generation aims to synthesize new characters in a target style using only a handful of reference images.<n>Existing approaches achieve only feature-level disentanglement, allowing the generator to re-entangle these features.<n>We propose the Structure-Level Disentangled Diffusion Model (SLD-Font), which receives content and style information from two separate channels.
Score: 18.601789249339014
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Few-shot Chinese font generation aims to synthesize new characters in a target style using only a handful of reference images. Achieving accurate content rendering and faithful style transfer requires effective disentanglement between content and style. However, existing approaches achieve only feature-level disentanglement, allowing the generator to re-entangle these features, leading to content distortion and degraded style fidelity. We propose the Structure-Level Disentangled Diffusion Model (SLD-Font), which receives content and style information from two separate channels. SimSun-style images are used as content templates and concatenated with noisy latent features as the input. Style features extracted by a CLIP model from target-style images are integrated via cross-attention. Additionally, we train a Background Noise Removal module in the pixel space to remove background noise in complex stroke regions. Based on theoretical validation of disentanglement effectiveness, we introduce a parameter-efficient fine-tuning strategy that updates only the style-related modules. This allows the model to better adapt to new styles while avoiding overfitting to the reference images' content. We further introduce the Grey and OCR metrics to evaluate the content quality of generated characters. Experimental results show that SLD-Font achieves significantly higher style fidelity while maintaining comparable content accuracy to existing state-of-the-art methods.

Related papers

Sissi: Zero-shot Style-guided Image Synthesis via Semantic-style Integration [57.02757226679549]
We introduce a training-free framework that reformulates style-guided synthesis as an in-context learning task.<n>We propose a Dynamic Semantic-Style Integration (DSSI) mechanism that reweights attention between semantic and style visual tokens.<n>Experiments show that our approach achieves high-fidelity stylization with superior semantic-style balance and visual quality.
arXiv Detail & Related papers (2026-01-10T16:01:14Z)
Only-Style: Stylistic Consistency in Image Generation without Content Leakage [21.68241134664501]
Only-Style is a method designed to mitigate content leakage in a semantically coherent manner while preserving stylistic consistency.<n>Only-Style works by localizing content leakage during inference, allowing the adaptive tuning of a parameter that controls the style alignment process.<n>Our approach demonstrates a significant improvement over state-of-the-art methods through extensive evaluation across diverse instances.
arXiv Detail & Related papers (2025-06-11T16:33:09Z)
Z-STAR+: A Zero-shot Style Transfer Method via Adjusting Style Distribution [24.88532732093652]
Style transfer presents a significant challenge, primarily centered on identifying an appropriate style representation.<n>In contrast to existing approaches, we have discovered that latent features in vanilla diffusion models inherently contain natural style and content distributions.<n>Our method adopts dual denoising paths to represent content and style references in latent space, subsequently guiding the content image denoising process with style latent codes.
arXiv Detail & Related papers (2024-11-28T15:56:17Z)
ZePo: Zero-Shot Portrait Stylization with Faster Sampling [61.14140480095604]
This paper presents an inversion-free portrait stylization framework based on diffusion models that accomplishes content and style feature fusion in merely four sampling steps. We propose a feature merging strategy to amalgamate redundant features in Consistency Features, thereby reducing the computational load of attention control.
arXiv Detail & Related papers (2024-08-10T08:53:41Z)
ArtWeaver: Advanced Dynamic Style Integration via Diffusion Model [73.95608242322949]
Stylized Text-to-Image Generation (STIG) aims to generate images from text prompts and style reference images. We present ArtWeaver, a novel framework that leverages pretrained Stable Diffusion to address challenges such as misinterpreted styles and inconsistent semantics.
arXiv Detail & Related papers (2024-05-24T07:19:40Z)
FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning [45.696909070215476]
FontDiffuser is a diffusion-based image-to-image one-shot font generation method. It consistently excels on complex characters and large style changes compared to previous methods.
arXiv Detail & Related papers (2023-12-19T13:23:20Z)
StyleAdapter: A Unified Stylized Image Generation Model [97.24936247688824]
StyleAdapter is a unified stylized image generation model capable of producing a variety of stylized images. It can be integrated with existing controllable synthesis methods, such as T2I-adapter and ControlNet.
arXiv Detail & Related papers (2023-09-04T19:16:46Z)
StyleMeUp: Towards Style-Agnostic Sketch-Based Image Retrieval [119.03470556503942]
Crossmodal matching problem is typically solved by learning a joint embedding space where semantic content shared between photo and sketch modalities are preserved. An effective model needs to explicitly account for this style diversity, crucially, to unseen user styles. Our model can not only disentangle the cross-modal shared semantic content, but can adapt the disentanglement to any unseen user style as well, making the model truly agnostic.
arXiv Detail & Related papers (2021-03-29T15:44:19Z)
Arbitrary Style Transfer via Multi-Adaptation Network [109.6765099732799]
A desired style transfer, given a content image and referenced style painting, would render the content image with the color tone and vivid stroke patterns of the style painting. A new disentanglement loss function enables our network to extract main style patterns and exact content structures to adapt to various input images.
arXiv Detail & Related papers (2020-05-27T08:00:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.