StrokeGAN+: Few-Shot Semi-Supervised Chinese Font Generation with Stroke
Encoding
- URL: http://arxiv.org/abs/2211.06198v1
- Date: Fri, 11 Nov 2022 13:39:26 GMT
- Title: StrokeGAN+: Few-Shot Semi-Supervised Chinese Font Generation with Stroke
Encoding
- Authors: Jinshan Zeng, Yefei Wang, Qi Chen, Yunxin Liu, Mingwen Wang, Yuan Yao
- Abstract summary: This paper proposes an effective model called textitGAN+Stroke, which incorporates the stroke encoding and the few-shot semi-supervised scheme into the CycleGAN model.
Experimental results show that the mode collapse issue can be effectively alleviated by the introduced one-bit stroke encoding and few-shot semi-supervised training scheme.
- Score: 23.886977380061662
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The generation of Chinese fonts has a wide range of applications. The
currently predominated methods are mainly based on deep generative models,
especially the generative adversarial networks (GANs). However, existing
GAN-based models usually suffer from the well-known mode collapse problem. When
mode collapse happens, the kind of GAN-based models will be failure to yield
the correct fonts. To address this issue, we introduce a one-bit stroke
encoding and a few-shot semi-supervised scheme (i.e., using a few paired data
as semi-supervised information) to explore the local and global structure
information of Chinese characters respectively, motivated by the intuition that
strokes and characters directly embody certain local and global modes of
Chinese characters. Based on these ideas, this paper proposes an effective
model called \textit{StrokeGAN+}, which incorporates the stroke encoding and
the few-shot semi-supervised scheme into the CycleGAN model. The effectiveness
of the proposed model is demonstrated by amounts of experiments. Experimental
results show that the mode collapse issue can be effectively alleviated by the
introduced one-bit stroke encoding and few-shot semi-supervised training
scheme, and that the proposed model outperforms the state-of-the-art models in
fourteen font generation tasks in terms of four important evaluation metrics
and the quality of generated characters. Besides CycleGAN, we also show that
the proposed idea can be adapted to other existing models to improve their
performance. The effectiveness of the proposed model for the zero-shot
traditional Chinese font generation is also evaluated in this paper.
Related papers
- TEncDM: Understanding the Properties of Diffusion Model in the Space of
Language Model Encodings [39.34471874948928]
We introduce a novel approach named Text Diffusion Model (TEncDM)
Instead of the commonly used token embedding space, we train our model in the space of the language model encodings.
We also analyse self-conditioning and find that it increases the magnitude of the model outputs.
arXiv Detail & Related papers (2024-02-29T12:25:45Z) - DeepCalliFont: Few-shot Chinese Calligraphy Font Synthesis by
Integrating Dual-modality Generative Models [20.76773399161289]
Few-shot font generation, especially for Chinese calligraphy fonts, is a challenging and ongoing problem.
We propose a novel model, DeepCalliFont, for few-shot Chinese calligraphy font synthesis by integrating dual-modality generative models.
arXiv Detail & Related papers (2023-12-16T04:23:12Z) - Identifying and Mitigating Model Failures through Few-shot CLIP-aided
Diffusion Generation [65.268245109828]
We propose an end-to-end framework to generate text descriptions of failure modes associated with spurious correlations.
These descriptions can be used to generate synthetic data using generative models, such as diffusion models.
Our experiments have shown remarkable textbfimprovements in accuracy ($sim textbf21%$) on hard sub-populations.
arXiv Detail & Related papers (2023-12-09T04:43:49Z) - Rethinking Masked Language Modeling for Chinese Spelling Correction [70.85829000570203]
We study Chinese Spelling Correction (CSC) as a joint decision made by two separate models: a language model and an error model.
We find that fine-tuning BERT tends to over-fit the error model while under-fit the language model, resulting in poor generalization to out-of-distribution error patterns.
We demonstrate that a very simple strategy, randomly masking 20% non-error tokens from the input sequence during fine-tuning is sufficient for learning a much better language model without sacrificing the error model.
arXiv Detail & Related papers (2023-05-28T13:19:12Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - Diff-Font: Diffusion Model for Robust One-Shot Font Generation [110.45944936952309]
We propose a novel one-shot font generation method based on a diffusion model, named Diff-Font.
The proposed model aims to generate the entire font library by giving only one sample as the reference.
The well-trained Diff-Font is not only robust to font gap and font variation, but also achieved promising performance on difficult character generation.
arXiv Detail & Related papers (2022-12-12T13:51:50Z) - SGCE-Font: Skeleton Guided Channel Expansion for Chinese Font Generation [19.20334101519465]
This paper proposes a novel information guidance module called the skeleton guided channel expansion (SGCE) module for the Chinese font generation.
Numerical results show that the mode collapse issue suffered by the known CycleGAN can be effectively alleviated by equipping with the proposed SGCE module.
arXiv Detail & Related papers (2022-11-26T04:21:46Z) - Lafite2: Few-shot Text-to-Image Generation [132.14211027057766]
We propose a novel method for pre-training text-to-image generation model on image-only datasets.
It considers a retrieval-then-optimization procedure to synthesize pseudo text features.
It can be beneficial to a wide range of settings, including the few-shot, semi-supervised and fully-supervised learning.
arXiv Detail & Related papers (2022-10-25T16:22:23Z) - Twist Decoding: Diverse Generators Guide Each Other [116.20780037268801]
We introduce Twist decoding, a simple and general inference algorithm that generates text while benefiting from diverse models.
Our method does not assume the vocabulary, tokenization or even generation order is shared.
arXiv Detail & Related papers (2022-05-19T01:27:53Z) - RoBERTa-wwm-ext Fine-Tuning for Chinese Text Classification [5.71097144710995]
Bidirectional Representations from Transformers (BERT) have shown to be a promising way to dramatically improve the performance across various Natural Language Processing tasks.
In this project, RoBERTa-wwm-ext pre-train language model was adopted and fine-tuned for Chinese text classification.
Models were able to classify Chinese texts into two categories, containing descriptions of legal behavior and descriptions of illegal behavior.
arXiv Detail & Related papers (2021-02-24T18:57:57Z) - StrokeGAN: Reducing Mode Collapse in Chinese Font Generation via Stroke
Encoding [20.877391644999534]
We introduce a one-bit stroke encoding to capture the key mode information of Chinese characters.
We incorporate this mode information into CycleGAN, a popular deep generative model for Chinese font generation.
StrokeGAN is generally outperforms the state-of-the-art methods in terms of content and recognition accuracies.
arXiv Detail & Related papers (2020-12-16T01:36:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.