XMP-Font: Self-Supervised Cross-Modality Pre-training for Few-Shot Font
Generation
- URL: http://arxiv.org/abs/2204.05084v1
- Date: Mon, 11 Apr 2022 13:34:40 GMT
- Title: XMP-Font: Self-Supervised Cross-Modality Pre-training for Few-Shot Font
Generation
- Authors: Wei Liu, Fangyue Liu, Fei Din, Qian He, Zili Yi
- Abstract summary: We propose a self-supervised cross-modality pre-training strategy and a cross-modality transformer-based encoder.
The encoder is conditioned jointly on the glyph image and the corresponding stroke labels.
It only requires one reference glyph and achieves the lowest rate of bad cases in the few-shot font generation task 28% lower than the second best.
- Score: 13.569449355929574
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generating a new font library is a very labor-intensive and time-consuming
job for glyph-rich scripts. Few-shot font generation is thus required, as it
requires only a few glyph references without fine-tuning during test. Existing
methods follow the style-content disentanglement paradigm and expect novel
fonts to be produced by combining the style codes of the reference glyphs and
the content representations of the source. However, these few-shot font
generation methods either fail to capture content-independent style
representations, or employ localized component-wise style representations,
which is insufficient to model many Chinese font styles that involve
hyper-component features such as inter-component spacing and
"connected-stroke". To resolve these drawbacks and make the style
representations more reliable, we propose a self-supervised cross-modality
pre-training strategy and a cross-modality transformer-based encoder that is
conditioned jointly on the glyph image and the corresponding stroke labels. The
cross-modality encoder is pre-trained in a self-supervised manner to allow
effective capture of cross- and intra-modality correlations, which facilitates
the content-style disentanglement and modeling style representations of all
scales (stroke-level, component-level and character-level). The pre-trained
encoder is then applied to the downstream font generation task without
fine-tuning. Experimental comparisons of our method with state-of-the-art
methods demonstrate our method successfully transfers styles of all scales. In
addition, it only requires one reference glyph and achieves the lowest rate of
bad cases in the few-shot font generation task 28% lower than the second best
Related papers
- DeepCalliFont: Few-shot Chinese Calligraphy Font Synthesis by
Integrating Dual-modality Generative Models [20.76773399161289]
Few-shot font generation, especially for Chinese calligraphy fonts, is a challenging and ongoing problem.
We propose a novel model, DeepCalliFont, for few-shot Chinese calligraphy font synthesis by integrating dual-modality generative models.
arXiv Detail & Related papers (2023-12-16T04:23:12Z) - Few shot font generation via transferring similarity guided global style
and quantization local style [11.817299400850176]
We present a novel font generation approach by aggregating styles from character similarity-guided global features and stylized component-level representations.
Our AFFG method could obtain a complete set of component-level style representations, and also control the global glyph characteristics.
arXiv Detail & Related papers (2023-09-02T05:05:40Z) - VQ-Font: Few-Shot Font Generation with Structure-Aware Enhancement and
Quantization [52.870638830417]
We propose a VQGAN-based framework (i.e., VQ-Font) to enhance glyph fidelity through token prior refinement and structure-aware enhancement.
Specifically, we pre-train a VQGAN to encapsulate font token prior within a codebook. Subsequently, VQ-Font refines the synthesized glyphs with the codebook to eliminate the domain gap between synthesized and real-world strokes.
arXiv Detail & Related papers (2023-08-27T06:32:20Z) - Diff-Font: Diffusion Model for Robust One-Shot Font Generation [110.45944936952309]
We propose a novel one-shot font generation method based on a diffusion model, named Diff-Font.
The proposed model aims to generate the entire font library by giving only one sample as the reference.
The well-trained Diff-Font is not only robust to font gap and font variation, but also achieved promising performance on difficult character generation.
arXiv Detail & Related papers (2022-12-12T13:51:50Z) - Few-Shot Font Generation by Learning Fine-Grained Local Styles [90.39288370855115]
Few-shot font generation (FFG) aims to generate a new font with a few examples.
We propose a new font generation approach by learning 1) the fine-grained local styles from references, and 2) the spatial correspondence between the content and reference glyphs.
arXiv Detail & Related papers (2022-05-20T05:07:05Z) - Few-shot Font Generation with Weakly Supervised Localized
Representations [17.97183447033118]
We propose a novel font generation method that learns localized styles, namely component-wise style representations, instead of universal styles.
Our method shows remarkably better few-shot font generation results (with only eight reference glyphs) than other state-of-the-art methods.
arXiv Detail & Related papers (2021-12-22T14:26:53Z) - Scalable Font Reconstruction with Dual Latent Manifolds [55.29525824849242]
We propose a deep generative model that performs typography analysis and font reconstruction.
Our approach enables us to massively scale up the number of character types we can effectively model.
We evaluate on the task of font reconstruction over various datasets representing character types of many languages.
arXiv Detail & Related papers (2021-09-10T20:37:43Z) - Font Completion and Manipulation by Cycling Between Multi-Modality
Representations [113.26243126754704]
We innovate to explore the generation of font glyphs as 2D graphic objects with the graph as an intermediate representation.
We formulate a cross-modality cycled image-to-image structure with a graph between an image encoder and an image.
Our model generates improved results than both image-to-image baseline and previous state-of-the-art methods for glyph completion.
arXiv Detail & Related papers (2021-08-30T02:43:29Z) - A Multi-Implicit Neural Representation for Fonts [79.6123184198301]
font-specific discontinuities like edges and corners are difficult to represent using neural networks.
We introduce textitmulti-implicits to represent fonts as a permutation-in set of learned implict functions, without losing features.
arXiv Detail & Related papers (2021-06-12T21:40:11Z) - Few-shot Font Generation with Localized Style Representations and
Factorization [23.781619323447003]
We propose a novel font generation method by learning localized styles, namely component-wise style representations, instead of universal styles.
Our method shows remarkably better few-shot font generation results (with only 8 reference glyph images) than other state-of-the-arts.
arXiv Detail & Related papers (2020-09-23T10:33:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.