StrokeGAN+: Few-Shot Semi-Supervised Chinese Font Generation with Stroke
Encoding
- URL: http://arxiv.org/abs/2211.06198v1
- Date: Fri, 11 Nov 2022 13:39:26 GMT
- Title: StrokeGAN+: Few-Shot Semi-Supervised Chinese Font Generation with Stroke
Encoding
- Authors: Jinshan Zeng, Yefei Wang, Qi Chen, Yunxin Liu, Mingwen Wang, Yuan Yao
- Abstract summary: This paper proposes an effective model called textitGAN+Stroke, which incorporates the stroke encoding and the few-shot semi-supervised scheme into the CycleGAN model.
Experimental results show that the mode collapse issue can be effectively alleviated by the introduced one-bit stroke encoding and few-shot semi-supervised training scheme.
- Score: 23.886977380061662
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The generation of Chinese fonts has a wide range of applications. The
currently predominated methods are mainly based on deep generative models,
especially the generative adversarial networks (GANs). However, existing
GAN-based models usually suffer from the well-known mode collapse problem. When
mode collapse happens, the kind of GAN-based models will be failure to yield
the correct fonts. To address this issue, we introduce a one-bit stroke
encoding and a few-shot semi-supervised scheme (i.e., using a few paired data
as semi-supervised information) to explore the local and global structure
information of Chinese characters respectively, motivated by the intuition that
strokes and characters directly embody certain local and global modes of
Chinese characters. Based on these ideas, this paper proposes an effective
model called \textit{StrokeGAN+}, which incorporates the stroke encoding and
the few-shot semi-supervised scheme into the CycleGAN model. The effectiveness
of the proposed model is demonstrated by amounts of experiments. Experimental
results show that the mode collapse issue can be effectively alleviated by the
introduced one-bit stroke encoding and few-shot semi-supervised training
scheme, and that the proposed model outperforms the state-of-the-art models in
fourteen font generation tasks in terms of four important evaluation metrics
and the quality of generated characters. Besides CycleGAN, we also show that
the proposed idea can be adapted to other existing models to improve their
performance. The effectiveness of the proposed model for the zero-shot
traditional Chinese font generation is also evaluated in this paper.
Related papers
- Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens [53.99177152562075]
Scaling up autoregressive models in vision has not proven as beneficial as in large language models.
We focus on two critical factors: whether models use discrete or continuous tokens, and whether tokens are generated in a random or fixed order using BERT- or GPT-like transformer architectures.
Our results show that while all models scale effectively in terms of validation loss, their evaluation performance -- measured by FID, GenEval score, and visual quality -- follows different trends.
arXiv Detail & Related papers (2024-10-17T17:59:59Z) - MFCLIP: Multi-modal Fine-grained CLIP for Generalizable Diffusion Face Forgery Detection [64.29452783056253]
The rapid development of photo-realistic face generation methods has raised significant concerns in society and academia.
Although existing approaches mainly capture face forgery patterns using image modality, other modalities like fine-grained noises and texts are not fully explored.
We propose a novel multi-modal fine-grained CLIP (MFCLIP) model, which mines comprehensive and fine-grained forgery traces across image-noise modalities.
arXiv Detail & Related papers (2024-09-15T13:08:59Z) - DeepCalliFont: Few-shot Chinese Calligraphy Font Synthesis by
Integrating Dual-modality Generative Models [20.76773399161289]
Few-shot font generation, especially for Chinese calligraphy fonts, is a challenging and ongoing problem.
We propose a novel model, DeepCalliFont, for few-shot Chinese calligraphy font synthesis by integrating dual-modality generative models.
arXiv Detail & Related papers (2023-12-16T04:23:12Z) - Rethinking Masked Language Modeling for Chinese Spelling Correction [70.85829000570203]
We study Chinese Spelling Correction (CSC) as a joint decision made by two separate models: a language model and an error model.
We find that fine-tuning BERT tends to over-fit the error model while under-fit the language model, resulting in poor generalization to out-of-distribution error patterns.
We demonstrate that a very simple strategy, randomly masking 20% non-error tokens from the input sequence during fine-tuning is sufficient for learning a much better language model without sacrificing the error model.
arXiv Detail & Related papers (2023-05-28T13:19:12Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - Diff-Font: Diffusion Model for Robust One-Shot Font Generation [110.45944936952309]
We propose a novel one-shot font generation method based on a diffusion model, named Diff-Font.
The proposed model aims to generate the entire font library by giving only one sample as the reference.
The well-trained Diff-Font is not only robust to font gap and font variation, but also achieved promising performance on difficult character generation.
arXiv Detail & Related papers (2022-12-12T13:51:50Z) - SGCE-Font: Skeleton Guided Channel Expansion for Chinese Font Generation [19.20334101519465]
This paper proposes a novel information guidance module called the skeleton guided channel expansion (SGCE) module for the Chinese font generation.
Numerical results show that the mode collapse issue suffered by the known CycleGAN can be effectively alleviated by equipping with the proposed SGCE module.
arXiv Detail & Related papers (2022-11-26T04:21:46Z) - Lafite2: Few-shot Text-to-Image Generation [132.14211027057766]
We propose a novel method for pre-training text-to-image generation model on image-only datasets.
It considers a retrieval-then-optimization procedure to synthesize pseudo text features.
It can be beneficial to a wide range of settings, including the few-shot, semi-supervised and fully-supervised learning.
arXiv Detail & Related papers (2022-10-25T16:22:23Z) - Few-shot Text Classification with Dual Contrastive Consistency [31.141350717029358]
In this paper, we explore how to utilize pre-trained language model to perform few-shot text classification.
We adopt supervised contrastive learning on few labeled data and consistency-regularization on vast unlabeled data.
arXiv Detail & Related papers (2022-09-29T19:26:23Z) - RoBERTa-wwm-ext Fine-Tuning for Chinese Text Classification [5.71097144710995]
Bidirectional Representations from Transformers (BERT) have shown to be a promising way to dramatically improve the performance across various Natural Language Processing tasks.
In this project, RoBERTa-wwm-ext pre-train language model was adopted and fine-tuned for Chinese text classification.
Models were able to classify Chinese texts into two categories, containing descriptions of legal behavior and descriptions of illegal behavior.
arXiv Detail & Related papers (2021-02-24T18:57:57Z) - StrokeGAN: Reducing Mode Collapse in Chinese Font Generation via Stroke
Encoding [20.877391644999534]
We introduce a one-bit stroke encoding to capture the key mode information of Chinese characters.
We incorporate this mode information into CycleGAN, a popular deep generative model for Chinese font generation.
StrokeGAN is generally outperforms the state-of-the-art methods in terms of content and recognition accuracies.
arXiv Detail & Related papers (2020-12-16T01:36:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.