FontCLIP: A Semantic Typography Visual-Language Model for Multilingual
Font Applications
- URL: http://arxiv.org/abs/2403.06453v1
- Date: Mon, 11 Mar 2024 06:08:16 GMT
- Title: FontCLIP: A Semantic Typography Visual-Language Model for Multilingual
Font Applications
- Authors: Yuki Tatsukawa, I-Chao Shen, Anran Qi, Yuki Koyama, Takeo Igarashi,
Ariel Shamir
- Abstract summary: FontCLIP is a model that connects the semantic understanding of a large vision-language model with typographical knowledge.
We integrate typography-specific knowledge into the comprehensive vision-language knowledge of a pretrained CLIP model.
FontCLIP's dual-modality and generalization abilities enable multilingual and cross-lingual font retrieval and letter shape optimization.
- Score: 27.609008096617057
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Acquiring the desired font for various design tasks can be challenging and
requires professional typographic knowledge. While previous font retrieval or
generation works have alleviated some of these difficulties, they often lack
support for multiple languages and semantic attributes beyond the training data
domains. To solve this problem, we present FontCLIP: a model that connects the
semantic understanding of a large vision-language model with typographical
knowledge. We integrate typography-specific knowledge into the comprehensive
vision-language knowledge of a pretrained CLIP model through a novel finetuning
approach. We propose to use a compound descriptive prompt that encapsulates
adaptively sampled attributes from a font attribute dataset focusing on Roman
alphabet characters. FontCLIP's semantic typographic latent space demonstrates
two unprecedented generalization abilities. First, FontCLIP generalizes to
different languages including Chinese, Japanese, and Korean (CJK), capturing
the typographical features of fonts across different languages, even though it
was only finetuned using fonts of Roman characters. Second, FontCLIP can
recognize the semantic attributes that are not presented in the training data.
FontCLIP's dual-modality and generalization abilities enable multilingual and
cross-lingual font retrieval and letter shape optimization, reducing the burden
of obtaining desired fonts.
Related papers
- Khattat: Enhancing Readability and Concept Representation of Semantic Typography [0.3994968615706021]
semantic typography involves selecting an idea, choosing an appropriate font, and balancing creativity with readability.
We introduce an end-to-end system that automates this process.
Key feature is our OCR-based loss function, which enhances readability and enables simultaneous stylization of multiple characters.
arXiv Detail & Related papers (2024-10-01T18:42:48Z) - GRIF-DM: Generation of Rich Impression Fonts using Diffusion Models [18.15911470339845]
We introduce a diffusion-based method, termed ourmethod, to generate fonts that vividly embody specific impressions.
Our experimental results, conducted on the MyFonts dataset, affirm that this method is capable of producing realistic, vibrant, and high-fidelity fonts.
arXiv Detail & Related papers (2024-08-14T02:26:46Z) - VQ-Font: Few-Shot Font Generation with Structure-Aware Enhancement and
Quantization [52.870638830417]
We propose a VQGAN-based framework (i.e., VQ-Font) to enhance glyph fidelity through token prior refinement and structure-aware enhancement.
Specifically, we pre-train a VQGAN to encapsulate font token prior within a codebook. Subsequently, VQ-Font refines the synthesized glyphs with the codebook to eliminate the domain gap between synthesized and real-world strokes.
arXiv Detail & Related papers (2023-08-27T06:32:20Z) - DGFont++: Robust Deformable Generative Networks for Unsupervised Font
Generation [19.473023811252116]
We propose a robust deformable generative network for unsupervised font generation (abbreviated as DGFont++)
To distinguish different styles, we train our model with a multi-task discriminator, which ensures that each style can be discriminated independently.
Experiments demonstrate that our model is able to generate character images of higher quality than state-of-the-art methods.
arXiv Detail & Related papers (2022-12-30T14:35:10Z) - Diff-Font: Diffusion Model for Robust One-Shot Font Generation [110.45944936952309]
We propose a novel one-shot font generation method based on a diffusion model, named Diff-Font.
The proposed model aims to generate the entire font library by giving only one sample as the reference.
The well-trained Diff-Font is not only robust to font gap and font variation, but also achieved promising performance on difficult character generation.
arXiv Detail & Related papers (2022-12-12T13:51:50Z) - FontNet: Closing the gap to font designer performance in font synthesis [3.991334489146843]
We propose a model, called FontNet, that learns to separate font styles in the embedding space where distances directly correspond to a measure of font similarity.
We design the network architecture and training procedure that can be adopted for any language system and can produce high-resolution font images.
arXiv Detail & Related papers (2022-05-13T08:37:10Z) - Generating More Pertinent Captions by Leveraging Semantics and Style on
Multi-Source Datasets [56.018551958004814]
This paper addresses the task of generating fluent descriptions by training on a non-uniform combination of data sources.
Large-scale datasets with noisy image-text pairs provide a sub-optimal source of supervision.
We propose to leverage and separate semantics and descriptive style through the incorporation of a style token and keywords extracted through a retrieval component.
arXiv Detail & Related papers (2021-11-24T19:00:05Z) - Scalable Font Reconstruction with Dual Latent Manifolds [55.29525824849242]
We propose a deep generative model that performs typography analysis and font reconstruction.
Our approach enables us to massively scale up the number of character types we can effectively model.
We evaluate on the task of font reconstruction over various datasets representing character types of many languages.
arXiv Detail & Related papers (2021-09-10T20:37:43Z) - A Multi-Implicit Neural Representation for Fonts [79.6123184198301]
font-specific discontinuities like edges and corners are difficult to represent using neural networks.
We introduce textitmulti-implicits to represent fonts as a permutation-in set of learned implict functions, without losing features.
arXiv Detail & Related papers (2021-06-12T21:40:11Z) - Adaptive Text Recognition through Visual Matching [86.40870804449737]
We introduce a new model that exploits the repetitive nature of characters in languages.
By doing this, we turn text recognition into a shape matching problem.
We show that it can handle challenges that traditional architectures are not able to solve without expensive retraining.
arXiv Detail & Related papers (2020-09-14T17:48:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.