Related papers: XMP-Font: Self-Supervised Cross-Modality Pre-training for Few-Shot Font Generation

XMP-Font: Self-Supervised Cross-Modality Pre-training for Few-Shot Font Generation

URL: http://arxiv.org/abs/2204.05084v1
Date: Mon, 11 Apr 2022 13:34:40 GMT
Title: XMP-Font: Self-Supervised Cross-Modality Pre-training for Few-Shot Font Generation
Authors: Wei Liu, Fangyue Liu, Fei Din, Qian He, Zili Yi
Abstract summary: We propose a self-supervised cross-modality pre-training strategy and a cross-modality transformer-based encoder. The encoder is conditioned jointly on the glyph image and the corresponding stroke labels. It only requires one reference glyph and achieves the lowest rate of bad cases in the few-shot font generation task 28% lower than the second best.
Score: 13.569449355929574
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Generating a new font library is a very labor-intensive and time-consuming job for glyph-rich scripts. Few-shot font generation is thus required, as it requires only a few glyph references without fine-tuning during test. Existing methods follow the style-content disentanglement paradigm and expect novel fonts to be produced by combining the style codes of the reference glyphs and the content representations of the source. However, these few-shot font generation methods either fail to capture content-independent style representations, or employ localized component-wise style representations, which is insufficient to model many Chinese font styles that involve hyper-component features such as inter-component spacing and "connected-stroke". To resolve these drawbacks and make the style representations more reliable, we propose a self-supervised cross-modality pre-training strategy and a cross-modality transformer-based encoder that is conditioned jointly on the glyph image and the corresponding stroke labels. The cross-modality encoder is pre-trained in a self-supervised manner to allow effective capture of cross- and intra-modality correlations, which facilitates the content-style disentanglement and modeling style representations of all scales (stroke-level, component-level and character-level). The pre-trained encoder is then applied to the downstream font generation task without fine-tuning. Experimental comparisons of our method with state-of-the-art methods demonstrate our method successfully transfers styles of all scales. In addition, it only requires one reference glyph and achieves the lowest rate of bad cases in the few-shot font generation task 28% lower than the second best

Related papers

DeepCalliFont: Few-shot Chinese Calligraphy Font Synthesis by Integrating Dual-modality Generative Models [20.76773399161289]
Few-shot font generation, especially for Chinese calligraphy fonts, is a challenging and ongoing problem. We propose a novel model, DeepCalliFont, for few-shot Chinese calligraphy font synthesis by integrating dual-modality generative models.
arXiv Detail & Related papers (2023-12-16T04:23:12Z)
Few shot font generation via transferring similarity guided global style and quantization local style [11.817299400850176]
We present a novel font generation approach by aggregating styles from character similarity-guided global features and stylized component-level representations. Our AFFG method could obtain a complete set of component-level style representations, and also control the global glyph characteristics.
arXiv Detail & Related papers (2023-09-02T05:05:40Z)
VQ-Font: Few-Shot Font Generation with Structure-Aware Enhancement and Quantization [52.870638830417]
We propose a VQGAN-based framework (i.e., VQ-Font) to enhance glyph fidelity through token prior refinement and structure-aware enhancement. Specifically, we pre-train a VQGAN to encapsulate font token prior within a codebook. Subsequently, VQ-Font refines the synthesized glyphs with the codebook to eliminate the domain gap between synthesized and real-world strokes.
arXiv Detail & Related papers (2023-08-27T06:32:20Z)
Diff-Font: Diffusion Model for Robust One-Shot Font Generation [110.45944936952309]
We propose a novel one-shot font generation method based on a diffusion model, named Diff-Font. The proposed model aims to generate the entire font library by giving only one sample as the reference. The well-trained Diff-Font is not only robust to font gap and font variation, but also achieved promising performance on difficult character generation.
arXiv Detail & Related papers (2022-12-12T13:51:50Z)
Few-Shot Font Generation by Learning Fine-Grained Local Styles [90.39288370855115]
Few-shot font generation (FFG) aims to generate a new font with a few examples. We propose a new font generation approach by learning 1) the fine-grained local styles from references, and 2) the spatial correspondence between the content and reference glyphs.
arXiv Detail & Related papers (2022-05-20T05:07:05Z)
Few-shot Font Generation with Weakly Supervised Localized Representations [17.97183447033118]
We propose a novel font generation method that learns localized styles, namely component-wise style representations, instead of universal styles. Our method shows remarkably better few-shot font generation results (with only eight reference glyphs) than other state-of-the-art methods.
arXiv Detail & Related papers (2021-12-22T14:26:53Z)
Scalable Font Reconstruction with Dual Latent Manifolds [55.29525824849242]
We propose a deep generative model that performs typography analysis and font reconstruction. Our approach enables us to massively scale up the number of character types we can effectively model. We evaluate on the task of font reconstruction over various datasets representing character types of many languages.
arXiv Detail & Related papers (2021-09-10T20:37:43Z)
Font Completion and Manipulation by Cycling Between Multi-Modality Representations [113.26243126754704]
We innovate to explore the generation of font glyphs as 2D graphic objects with the graph as an intermediate representation. We formulate a cross-modality cycled image-to-image structure with a graph between an image encoder and an image. Our model generates improved results than both image-to-image baseline and previous state-of-the-art methods for glyph completion.
arXiv Detail & Related papers (2021-08-30T02:43:29Z)
A Multi-Implicit Neural Representation for Fonts [79.6123184198301]
font-specific discontinuities like edges and corners are difficult to represent using neural networks. We introduce textitmulti-implicits to represent fonts as a permutation-in set of learned implict functions, without losing features.
arXiv Detail & Related papers (2021-06-12T21:40:11Z)
Few-shot Font Generation with Localized Style Representations and Factorization [23.781619323447003]
We propose a novel font generation method by learning localized styles, namely component-wise style representations, instead of universal styles. Our method shows remarkably better few-shot font generation results (with only 8 reference glyph images) than other state-of-the-arts.
arXiv Detail & Related papers (2020-09-23T10:33:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.