Related papers: Persis: A Persian Font Recognition Pipeline Using Convolutional Neural Networks

Persis: A Persian Font Recognition Pipeline Using Convolutional Neural Networks

URL: http://arxiv.org/abs/2310.05255v2
Date: Tue, 10 Oct 2023 05:48:25 GMT
Title: Persis: A Persian Font Recognition Pipeline Using Convolutional Neural Networks
Authors: Mehrdad Mohammadian, Neda Maleki, Tobias Olsson, Fredrik Ahlgren
Abstract summary: We introduce the first publicly available datasets in the field of Persian font recognition. We employ Convolutional Neural Networks (CNN) to address this problem. We conclude that CNN methods can be used to recognize Persian fonts without the need for additional pre-processing steps.
Score: 2.239394800147746
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: What happens if we encounter a suitable font for our design work but do not know its name? Visual Font Recognition (VFR) systems are used to identify the font typeface in an image. These systems can assist graphic designers in identifying fonts used in images. A VFR system also aids in improving the speed and accuracy of Optical Character Recognition (OCR) systems. In this paper, we introduce the first publicly available datasets in the field of Persian font recognition and employ Convolutional Neural Networks (CNN) to address this problem. The results show that the proposed pipeline obtained 78.0% top-1 accuracy on our new datasets, 89.1% on the IDPL-PFOD dataset, and 94.5% on the KAFD dataset. Furthermore, the average time spent in the entire pipeline for one sample of our proposed datasets is 0.54 and 0.017 seconds for CPU and GPU, respectively. We conclude that CNN methods can be used to recognize Persian fonts without the need for additional pre-processing steps such as feature extraction, binarization, normalization, etc.

Related papers

Parameter-Efficient Fine-Tuning of DINOv2 for Large-Scale Font Classification [0.22940141855172033]
We present a font classification system capable of identifying 394 font families from rendered text images.<n>Our approach fine-tunes a DINOv2 Vision Transformer using Low-Rank Adaptation (LoRA), achieving approximately 86% top-1 accuracy while training fewer than 1% of the model's 87.2M parameters.
arXiv Detail & Related papers (2026-02-14T21:15:12Z)
Optical Character Recognition using Convolutional Neural Networks for Ashokan Brahmi Inscriptions [0.13194391758295113]
The study mainly focuses on three pre-trained CNNs, namely LeNet, VGG-16, and MobileNet. The findings reveal that MobileNet outperforms the other two models in terms of accuracy, achieving a validation accuracy of 95.94% and validation loss of 0.129.
arXiv Detail & Related papers (2024-12-29T09:56:03Z)
Can Encrypted Images Still Train Neural Networks? Investigating Image Information and Random Vortex Transformation [51.475827684468875]
We establish a novel framework for measuring image information content to evaluate the variation in information content during image transformations. We also propose a novel image encryption algorithm called Random Vortex Transformation.
arXiv Detail & Related papers (2024-11-25T09:14:53Z)
IDPL-PFOD2: A New Large-Scale Dataset for Printed Farsi Optical Character Recognition [6.780778335996319]
This paper presents a novel large-scale dataset, IDPL-PFOD2, tailored for Farsi printed text recognition. The dataset comprises 2003541 images featuring a wide variety of fonts, styles, and sizes.
arXiv Detail & Related papers (2023-12-02T16:56:57Z)
CF-Font: Content Fusion for Few-shot Font Generation [63.79915037830131]
We propose a content fusion module (CFM) to project the content feature into a linear space defined by the content features of basis fonts. Our method also allows to optimize the style representation vector of reference images. We have evaluated our method on a dataset of 300 fonts with 6.5k characters each.
arXiv Detail & Related papers (2023-03-24T14:18:40Z)
Keypoint Message Passing for Video-based Person Re-Identification [106.41022426556776]
Video-based person re-identification (re-ID) is an important technique in visual surveillance systems which aims to match video snippets of people captured by different cameras. Existing methods are mostly based on convolutional neural networks (CNNs), whose building blocks either process local neighbor pixels at a time, or, when 3D convolutions are used to model temporal information, suffer from the misalignment problem caused by person movement. In this paper, we propose to overcome the limitations of normal convolutions with a human-oriented graph method. Specifically, features located at person joint keypoints are extracted and connected as a spatial-temporal graph
arXiv Detail & Related papers (2021-11-16T08:01:16Z)
Detecting Handwritten Mathematical Terms with Sensor Based Data [71.84852429039881]
We propose a solution to the UbiComp 2021 Challenge by Stabilo in which handwritten mathematical terms are supposed to be automatically classified. The input data set contains data of different writers, with label strings constructed from a total of 15 different possible characters.
arXiv Detail & Related papers (2021-09-12T19:33:34Z)
Font Completion and Manipulation by Cycling Between Multi-Modality Representations [113.26243126754704]
We innovate to explore the generation of font glyphs as 2D graphic objects with the graph as an intermediate representation. We formulate a cross-modality cycled image-to-image structure with a graph between an image encoder and an image. Our model generates improved results than both image-to-image baseline and previous state-of-the-art methods for glyph completion.
arXiv Detail & Related papers (2021-08-30T02:43:29Z)
A Multi-Implicit Neural Representation for Fonts [79.6123184198301]
font-specific discontinuities like edges and corners are difficult to represent using neural networks. We introduce textitmulti-implicits to represent fonts as a permutation-in set of learned implict functions, without losing features.
arXiv Detail & Related papers (2021-06-12T21:40:11Z)
Iranis: A Large-scale Dataset of Farsi License Plate Characters [2.537406035246369]
This paper introduces a large-scale dataset that includes images of numbers and characters used in Iranian car license plates. The variety of instances in terms of camera shooting angle, illumination, resolution, and contrast make the dataset a proper choice for training deep learning systems.
arXiv Detail & Related papers (2021-01-01T18:54:44Z)
An Efficient Language-Independent Multi-Font OCR for Arabic Script [0.0]
This paper proposes a complete Arabic OCR system that takes a scanned image of Arabic Naskh script as an input and generates a corresponding digital document. This paper also proposes an improved font-independent character algorithm that outperforms the state-of-the-art segmentation algorithms.
arXiv Detail & Related papers (2020-09-18T22:57:03Z)
Handwritten Character Recognition from Wearable Passive RFID [1.3190581566723918]
We propose a preprocessing pipeline that fuses the sequence and bitmap representations together. The data is collected from ten subjects containing altogether 7500 characters. The proposed model reaches 72% accuracy in experimental tests, which can be considered good accuracy for this challenging dataset.
arXiv Detail & Related papers (2020-08-06T09:45:29Z)
Learning to map source code to software vulnerability using code-as-a-graph [67.62847721118142]
We explore the applicability of Graph Neural Networks in learning the nuances of source code from a security perspective. We show that a code-as-graph encoding is more meaningful for vulnerability detection than existing code-as-photo and linear sequence encoding approaches.
arXiv Detail & Related papers (2020-06-15T16:05:27Z)
Large Scale Font Independent Urdu Text Recognition System [1.5229257192293197]
There exists no automated system that can reliably recognize printed Urdu text in images and videos across different fonts. We have developed Qaida, a large scale data set with 256 fonts, and a complete Urdu lexicon. We have also developed a Convolutional Neural Network (CNN) based classification model which can recognize Urdu ligatures with 84.2% accuracy.
arXiv Detail & Related papers (2020-05-14T06:57:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.