FontTransformer: Few-shot High-resolution Chinese Glyph Image Synthesis
via Stacked Transformers
- URL: http://arxiv.org/abs/2210.06301v2
- Date: Thu, 13 Oct 2022 02:53:19 GMT
- Title: FontTransformer: Few-shot High-resolution Chinese Glyph Image Synthesis
via Stacked Transformers
- Authors: Yitian Liu, Zhouhui Lian
- Abstract summary: This paper proposes FontTransformer, a novel few-shot learning model, for high-resolution Chinese glyph image synthesis.
We also design a novel encoding scheme to feed more glyph information and prior knowledge to our model.
- Score: 21.705680113996742
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic generation of high-quality Chinese fonts from a few online training
samples is a challenging task, especially when the amount of samples is very
small. Existing few-shot font generation methods can only synthesize
low-resolution glyph images that often possess incorrect topological structures
or/and incomplete strokes. To address the problem, this paper proposes
FontTransformer, a novel few-shot learning model, for high-resolution Chinese
glyph image synthesis by using stacked Transformers. The key idea is to apply
the parallel Transformer to avoid the accumulation of prediction errors and
utilize the serial Transformer to enhance the quality of synthesized strokes.
Meanwhile, we also design a novel encoding scheme to feed more glyph
information and prior knowledge to our model, which further enables the
generation of high-resolution and visually-pleasing glyph images. Both
qualitative and quantitative experimental results demonstrate the superiority
of our method compared to other existing approaches in the few-shot Chinese
font synthesis task.
Related papers
- HFH-Font: Few-shot Chinese Font Synthesis with Higher Quality, Faster Speed, and Higher Resolution [17.977410216055024]
We introduce HFH-Font, a few-shot font synthesis method capable of efficiently generating high-resolution glyph images.
For the first time, large-scale Chinese vector fonts of a quality comparable to those manually created by professional font designers can be automatically generated.
arXiv Detail & Related papers (2024-10-09T02:30:24Z) - Translatotron-V(ison): An End-to-End Model for In-Image Machine Translation [81.45400849638347]
In-image machine translation (IIMT) aims to translate an image containing texts in source language into an image containing translations in target language.
In this paper, we propose an end-to-end IIMT model consisting of four modules.
Our model achieves competitive performance compared to cascaded models with only 70.9% of parameters, and significantly outperforms the pixel-level end-to-end IIMT model.
arXiv Detail & Related papers (2024-07-03T08:15:39Z) - DeepCalliFont: Few-shot Chinese Calligraphy Font Synthesis by
Integrating Dual-modality Generative Models [20.76773399161289]
Few-shot font generation, especially for Chinese calligraphy fonts, is a challenging and ongoing problem.
We propose a novel model, DeepCalliFont, for few-shot Chinese calligraphy font synthesis by integrating dual-modality generative models.
arXiv Detail & Related papers (2023-12-16T04:23:12Z) - VQ-Font: Few-Shot Font Generation with Structure-Aware Enhancement and
Quantization [52.870638830417]
We propose a VQGAN-based framework (i.e., VQ-Font) to enhance glyph fidelity through token prior refinement and structure-aware enhancement.
Specifically, we pre-train a VQGAN to encapsulate font token prior within a codebook. Subsequently, VQ-Font refines the synthesized glyphs with the codebook to eliminate the domain gap between synthesized and real-world strokes.
arXiv Detail & Related papers (2023-08-27T06:32:20Z) - DeepVecFont-v2: Exploiting Transformers to Synthesize Vector Fonts with
Higher Quality [38.32966391626858]
This paper proposes an enhanced version of DeepVecFont for vector font synthesis.
We adopt Transformers instead of RNNs to process sequential data and design a relaxation representation for vector outlines.
We also propose to sample auxiliary points in addition to control points to precisely align the generated and target B'ezier curves or lines.
arXiv Detail & Related papers (2023-03-25T23:28:19Z) - Unified Multi-Modal Latent Diffusion for Joint Subject and Text
Conditional Image Generation [63.061871048769596]
We present a novel Unified Multi-Modal Latent Diffusion (UMM-Diffusion) which takes joint texts and images containing specified subjects as input sequences.
To be more specific, both input texts and images are encoded into one unified multi-modal latent space.
Our method is able to generate high-quality images with complex semantics from both aspects of input texts and images.
arXiv Detail & Related papers (2023-03-16T13:50:20Z) - Scaling Autoregressive Models for Content-Rich Text-to-Image Generation [95.02406834386814]
Parti treats text-to-image generation as a sequence-to-sequence modeling problem.
Parti uses a Transformer-based image tokenizer, ViT-VQGAN, to encode images as sequences of discrete tokens.
PartiPrompts (P2) is a new holistic benchmark of over 1600 English prompts.
arXiv Detail & Related papers (2022-06-22T01:11:29Z) - XMP-Font: Self-Supervised Cross-Modality Pre-training for Few-Shot Font
Generation [13.569449355929574]
We propose a self-supervised cross-modality pre-training strategy and a cross-modality transformer-based encoder.
The encoder is conditioned jointly on the glyph image and the corresponding stroke labels.
It only requires one reference glyph and achieves the lowest rate of bad cases in the few-shot font generation task 28% lower than the second best.
arXiv Detail & Related papers (2022-04-11T13:34:40Z) - DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality
Learning [21.123297001902177]
We propose a novel method, DeepVecFont, to generate visually-pleasing vector glyphs.
The highlights of this paper are threefold. First, we design a dual-modality learning strategy which utilizes both image-aspect and sequence-aspect features of fonts to synthesize vector glyphs.
Second, we provide a new generative paradigm to handle unstructured data (e.g., vector glyphs) by randomly sampling plausible results to get the optimal one which is further refined under the guidance of generated structured data.
arXiv Detail & Related papers (2021-10-13T12:57:19Z) - Scalable Font Reconstruction with Dual Latent Manifolds [55.29525824849242]
We propose a deep generative model that performs typography analysis and font reconstruction.
Our approach enables us to massively scale up the number of character types we can effectively model.
We evaluate on the task of font reconstruction over various datasets representing character types of many languages.
arXiv Detail & Related papers (2021-09-10T20:37:43Z) - A Multi-Implicit Neural Representation for Fonts [79.6123184198301]
font-specific discontinuities like edges and corners are difficult to represent using neural networks.
We introduce textitmulti-implicits to represent fonts as a permutation-in set of learned implict functions, without losing features.
arXiv Detail & Related papers (2021-06-12T21:40:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.