Persis: A Persian Font Recognition Pipeline Using Convolutional Neural
Networks
- URL: http://arxiv.org/abs/2310.05255v2
- Date: Tue, 10 Oct 2023 05:48:25 GMT
- Title: Persis: A Persian Font Recognition Pipeline Using Convolutional Neural
Networks
- Authors: Mehrdad Mohammadian, Neda Maleki, Tobias Olsson, Fredrik Ahlgren
- Abstract summary: We introduce the first publicly available datasets in the field of Persian font recognition.
We employ Convolutional Neural Networks (CNN) to address this problem.
We conclude that CNN methods can be used to recognize Persian fonts without the need for additional pre-processing steps.
- Score: 2.239394800147746
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: What happens if we encounter a suitable font for our design work but do not
know its name? Visual Font Recognition (VFR) systems are used to identify the
font typeface in an image. These systems can assist graphic designers in
identifying fonts used in images. A VFR system also aids in improving the speed
and accuracy of Optical Character Recognition (OCR) systems. In this paper, we
introduce the first publicly available datasets in the field of Persian font
recognition and employ Convolutional Neural Networks (CNN) to address this
problem. The results show that the proposed pipeline obtained 78.0% top-1
accuracy on our new datasets, 89.1% on the IDPL-PFOD dataset, and 94.5% on the
KAFD dataset. Furthermore, the average time spent in the entire pipeline for
one sample of our proposed datasets is 0.54 and 0.017 seconds for CPU and GPU,
respectively. We conclude that CNN methods can be used to recognize Persian
fonts without the need for additional pre-processing steps such as feature
extraction, binarization, normalization, etc.
Related papers
- IDPL-PFOD2: A New Large-Scale Dataset for Printed Farsi Optical
Character Recognition [6.780778335996319]
This paper presents a novel large-scale dataset, IDPL-PFOD2, tailored for Farsi printed text recognition.
The dataset comprises 2003541 images featuring a wide variety of fonts, styles, and sizes.
arXiv Detail & Related papers (2023-12-02T16:56:57Z) - CF-Font: Content Fusion for Few-shot Font Generation [63.79915037830131]
We propose a content fusion module (CFM) to project the content feature into a linear space defined by the content features of basis fonts.
Our method also allows to optimize the style representation vector of reference images.
We have evaluated our method on a dataset of 300 fonts with 6.5k characters each.
arXiv Detail & Related papers (2023-03-24T14:18:40Z) - Keypoint Message Passing for Video-based Person Re-Identification [106.41022426556776]
Video-based person re-identification (re-ID) is an important technique in visual surveillance systems which aims to match video snippets of people captured by different cameras.
Existing methods are mostly based on convolutional neural networks (CNNs), whose building blocks either process local neighbor pixels at a time, or, when 3D convolutions are used to model temporal information, suffer from the misalignment problem caused by person movement.
In this paper, we propose to overcome the limitations of normal convolutions with a human-oriented graph method. Specifically, features located at person joint keypoints are extracted and connected as a spatial-temporal graph
arXiv Detail & Related papers (2021-11-16T08:01:16Z) - Detecting Handwritten Mathematical Terms with Sensor Based Data [71.84852429039881]
We propose a solution to the UbiComp 2021 Challenge by Stabilo in which handwritten mathematical terms are supposed to be automatically classified.
The input data set contains data of different writers, with label strings constructed from a total of 15 different possible characters.
arXiv Detail & Related papers (2021-09-12T19:33:34Z) - Font Completion and Manipulation by Cycling Between Multi-Modality
Representations [113.26243126754704]
We innovate to explore the generation of font glyphs as 2D graphic objects with the graph as an intermediate representation.
We formulate a cross-modality cycled image-to-image structure with a graph between an image encoder and an image.
Our model generates improved results than both image-to-image baseline and previous state-of-the-art methods for glyph completion.
arXiv Detail & Related papers (2021-08-30T02:43:29Z) - A Multi-Implicit Neural Representation for Fonts [79.6123184198301]
font-specific discontinuities like edges and corners are difficult to represent using neural networks.
We introduce textitmulti-implicits to represent fonts as a permutation-in set of learned implict functions, without losing features.
arXiv Detail & Related papers (2021-06-12T21:40:11Z) - Iranis: A Large-scale Dataset of Farsi License Plate Characters [2.537406035246369]
This paper introduces a large-scale dataset that includes images of numbers and characters used in Iranian car license plates.
The variety of instances in terms of camera shooting angle, illumination, resolution, and contrast make the dataset a proper choice for training deep learning systems.
arXiv Detail & Related papers (2021-01-01T18:54:44Z) - An Efficient Language-Independent Multi-Font OCR for Arabic Script [0.0]
This paper proposes a complete Arabic OCR system that takes a scanned image of Arabic Naskh script as an input and generates a corresponding digital document.
This paper also proposes an improved font-independent character algorithm that outperforms the state-of-the-art segmentation algorithms.
arXiv Detail & Related papers (2020-09-18T22:57:03Z) - Handwritten Character Recognition from Wearable Passive RFID [1.3190581566723918]
We propose a preprocessing pipeline that fuses the sequence and bitmap representations together.
The data is collected from ten subjects containing altogether 7500 characters.
The proposed model reaches 72% accuracy in experimental tests, which can be considered good accuracy for this challenging dataset.
arXiv Detail & Related papers (2020-08-06T09:45:29Z) - Learning to map source code to software vulnerability using
code-as-a-graph [67.62847721118142]
We explore the applicability of Graph Neural Networks in learning the nuances of source code from a security perspective.
We show that a code-as-graph encoding is more meaningful for vulnerability detection than existing code-as-photo and linear sequence encoding approaches.
arXiv Detail & Related papers (2020-06-15T16:05:27Z) - Large Scale Font Independent Urdu Text Recognition System [1.5229257192293197]
There exists no automated system that can reliably recognize printed Urdu text in images and videos across different fonts.
We have developed Qaida, a large scale data set with 256 fonts, and a complete Urdu lexicon.
We have also developed a Convolutional Neural Network (CNN) based classification model which can recognize Urdu ligatures with 84.2% accuracy.
arXiv Detail & Related papers (2020-05-14T06:57:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.