Related papers: FONTNET: On-Device Font Understanding and Prediction Pipeline

FONTNET: On-Device Font Understanding and Prediction Pipeline

URL: http://arxiv.org/abs/2103.16150v1
Date: Tue, 30 Mar 2021 08:11:24 GMT
Title: FONTNET: On-Device Font Understanding and Prediction Pipeline
Authors: Rakshith S, Rishabh Khurana, Vibhav Agarwal, Jayesh Rajkumar Vachhani, Guggilla Bhanodai
Abstract summary: We propose two engines: Font Detection Engine and Font Prediction Engine. We develop a novel CNN architecture for identifying font style of text in images. Second, we designed a novel algorithm for predicting similar fonts for a given query font. Third, we have optimized and deployed the entire engine On-Device which ensures privacy and improves latency in real time applications such as instant messaging.
Score: 1.5749416770494706
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Fonts are one of the most basic and core design concepts. Numerous use cases can benefit from an in depth understanding of Fonts such as Text Customization which can change text in an image while maintaining the Font attributes like style, color, size. Currently, Text recognition solutions can group recognized text based on line breaks or paragraph breaks, if the Font attributes are known multiple text blocks can be combined based on context in a meaningful manner. In this paper, we propose two engines: Font Detection Engine, which identifies the font style, color and size attributes of text in an image and a Font Prediction Engine, which predicts similar fonts for a query font. Major contributions of this paper are three-fold: First, we developed a novel CNN architecture for identifying font style of text in images. Second, we designed a novel algorithm for predicting similar fonts for a given query font. Third, we have optimized and deployed the entire engine On-Device which ensures privacy and improves latency in real time applications such as instant messaging. We achieve a worst case On-Device inference time of 30ms and a model size of 4.5MB for both the engines.

Related papers

FontAdapter: Instant Font Adaptation in Visual Text Generation [45.00544198317519]
We present FontAdapter, a framework that enables visual text generation in unseen fonts within seconds, conditioned on a reference glyph image.<n>Experiments demonstrate that FontAdapter enables high-quality, robust font customization across unseen fonts without additional fine-tuning during inference.
arXiv Detail & Related papers (2025-06-06T08:00:49Z)
FontGuard: A Robust Font Watermarking Approach Leveraging Deep Font Knowledge [14.545769739571291]
We introduce FontGuard, a novel font watermarking model that harnesses the capabilities of font models and language-guided contrastive learning. FontGuard modifies fonts by altering hidden style features, resulting in better font quality upon watermark embedding. In the decoder, we employ an image-text contrastive learning to reconstruct the embedded bits, which can achieve desirable robustness against various real-world transmission distortions.
arXiv Detail & Related papers (2025-04-04T02:39:33Z)
Texture or Semantics? Vision-Language Models Get Lost in Font Recognition [48.856390495568114]
We introduce the Font Recognition Benchmark (FRB), a compact and well-structured dataset comprising 15 commonly used fonts. FRB includes two versions: (i) an easy version, where 10 sentences are rendered in different fonts, and (ii) a hard version, where each text sample consists of the names of the 15 fonts themselves. We find that current VLMs exhibit limited font recognition capabilities, with many state-of-the-art models failing to achieve satisfactory performance.
arXiv Detail & Related papers (2025-03-31T06:33:21Z)
Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models [76.68654868991517]
Long-form text in images, such as paragraphs in slides or documents, remains a major challenge for current generative models. We introduce a novel text-focused, binary tokenizer optimized for capturing detailed scene text features. We develop ModelName, a multimodal autoregressive model that excels in generating high-quality long-text images with unprecedented fidelity.
arXiv Detail & Related papers (2025-03-26T03:44:25Z)
TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering [118.30923824681642]
TextDiffuser-2 aims to unleash the power of language models for text rendering. We utilize the language model within the diffusion model to encode the position and texts at the line level. We conduct extensive experiments and incorporate user studies involving human participants as well as GPT-4V.
arXiv Detail & Related papers (2023-11-28T04:02:40Z)
VQ-Font: Few-Shot Font Generation with Structure-Aware Enhancement and Quantization [52.870638830417]
We propose a VQGAN-based framework (i.e., VQ-Font) to enhance glyph fidelity through token prior refinement and structure-aware enhancement. Specifically, we pre-train a VQGAN to encapsulate font token prior within a codebook. Subsequently, VQ-Font refines the synthesized glyphs with the codebook to eliminate the domain gap between synthesized and real-world strokes.
arXiv Detail & Related papers (2023-08-27T06:32:20Z)
Combining OCR Models for Reading Early Modern Printed Books [2.839401411131008]
We study the usage of fine-grained font recognition on OCR for books printed from the 15th to the 18th century. We show that OCR performance is strongly impacted by font style and that selecting fine-tuned models with font group recognition has a very positive impact on the results.
arXiv Detail & Related papers (2023-05-11T20:43:50Z)
CF-Font: Content Fusion for Few-shot Font Generation [63.79915037830131]
We propose a content fusion module (CFM) to project the content feature into a linear space defined by the content features of basis fonts. Our method also allows to optimize the style representation vector of reference images. We have evaluated our method on a dataset of 300 fonts with 6.5k characters each.
arXiv Detail & Related papers (2023-03-24T14:18:40Z)
Diff-Font: Diffusion Model for Robust One-Shot Font Generation [110.45944936952309]
We propose a novel one-shot font generation method based on a diffusion model, named Diff-Font. The proposed model aims to generate the entire font library by giving only one sample as the reference. The well-trained Diff-Font is not only robust to font gap and font variation, but also achieved promising performance on difficult character generation.
arXiv Detail & Related papers (2022-12-12T13:51:50Z)
Scalable Font Reconstruction with Dual Latent Manifolds [55.29525824849242]
We propose a deep generative model that performs typography analysis and font reconstruction. Our approach enables us to massively scale up the number of character types we can effectively model. We evaluate on the task of font reconstruction over various datasets representing character types of many languages.
arXiv Detail & Related papers (2021-09-10T20:37:43Z)
Font Completion and Manipulation by Cycling Between Multi-Modality Representations [113.26243126754704]
We innovate to explore the generation of font glyphs as 2D graphic objects with the graph as an intermediate representation. We formulate a cross-modality cycled image-to-image structure with a graph between an image encoder and an image. Our model generates improved results than both image-to-image baseline and previous state-of-the-art methods for glyph completion.
arXiv Detail & Related papers (2021-08-30T02:43:29Z)
AdaptiFont: Increasing Individuals' Reading Speed with a Generative Font Model and Bayesian Optimization [3.480626767752489]
AdaptiFont is a human-in-the-loop system aimed at interactively increasing readability of text displayed on a monitor. We generate new true-type-fonts through active learning, render texts with the new font, and measure individual users' reading speed. The results of a user study show that this adaptive font generation system finds regions in the font space corresponding to high reading speeds, that these fonts significantly increase participants' reading speed, and that the found fonts are significantly different across individual readers.
arXiv Detail & Related papers (2021-04-21T19:56:28Z)
Impressions2Font: Generating Fonts by Specifying Impressions [10.345810093530261]
This paper proposes Impressions2Font (Imp2Font) that generates font images with specific impressions. Imp2Font accepts an arbitrary number of impression words as the condition to generate the font images.
arXiv Detail & Related papers (2021-03-18T06:10:26Z)
Few-shot Compositional Font Generation with Dual Memory [16.967987801167514]
We propose a novel font generation framework, named Dual Memory-augmented Font Generation Network (DM-Font) We employ memory components and global-context awareness in the generator to take advantage of the compositionality. In the experiments on Korean-handwriting fonts and Thai-printing fonts, we observe that our method generates a significantly better quality of samples with faithful stylization.
arXiv Detail & Related papers (2020-05-21T08:13:40Z)
Attribute2Font: Creating Fonts You Want From Attributes [32.82714291856353]
Attribute2Font is trained to perform font style transfer between any two fonts conditioned on their attribute values. A novel unit named Attribute Attention Module is designed to make those generated glyph images better embody the prominent font attributes.
arXiv Detail & Related papers (2020-05-16T04:06:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.