A CNN Based Framework for Unistroke Numeral Recognition in Air-Writing
- URL: http://arxiv.org/abs/2303.07989v1
- Date: Tue, 14 Mar 2023 15:44:45 GMT
- Title: A CNN Based Framework for Unistroke Numeral Recognition in Air-Writing
- Authors: Prasun Roy, Subhankar Ghosh, Umapada Pal
- Abstract summary: This paper proposes a generic video camera-aided convolutional neural network (CNN) based air-writing framework.
Gestures are performed using a marker of fixed color in front of a generic video camera, followed by color-based segmentation to identify the marker and track the trajectory of the marker tip.
The proposed framework has achieved 97.7%, 95.4% and 93.7% recognition rates in person independent evaluations on English, Bengali and Devanagari numerals, respectively.
- Score: 17.426389959819538
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Air-writing refers to virtually writing linguistic characters through hand
gestures in three-dimensional space with six degrees of freedom. This paper
proposes a generic video camera-aided convolutional neural network (CNN) based
air-writing framework. Gestures are performed using a marker of fixed color in
front of a generic video camera, followed by color-based segmentation to
identify the marker and track the trajectory of the marker tip. A pre-trained
CNN is then used to classify the gesture. The recognition accuracy is further
improved using transfer learning with the newly acquired data. The performance
of the system varies significantly on the illumination condition due to
color-based segmentation. In a less fluctuating illumination condition, the
system is able to recognize isolated unistroke numerals of multiple languages.
The proposed framework has achieved 97.7%, 95.4% and 93.7% recognition rates in
person independent evaluations on English, Bengali and Devanagari numerals,
respectively.
Related papers
- Color Equivariant Convolutional Networks [50.655443383582124]
CNNs struggle if there is data imbalance between color variations introduced by accidental recording conditions.
We propose Color Equivariant Convolutions ( CEConvs), a novel deep learning building block that enables shape feature sharing across the color spectrum.
We demonstrate the benefits of CEConvs in terms of downstream performance to various tasks and improved robustness to color changes, including train-test distribution shifts.
arXiv Detail & Related papers (2023-10-30T09:18:49Z) - Does color modalities affect handwriting recognition? An empirical study
on Persian handwritings using convolutional neural networks [7.965705015476877]
We investigate to see whether color modalities of handwritten digits and words affect their recognition accuracy or speed.
We selected 13,330 isolated digits and 62,500 words from a novel Persian handwritten database.
CNN on the BW digit and word images has a higher performance compared to the other two color modalities.
arXiv Detail & Related papers (2023-07-22T19:47:52Z) - Name Your Colour For the Task: Artificially Discover Colour Naming via
Colour Quantisation Transformer [62.75343115345667]
We propose a novel colour quantisation transformer, CQFormer, that quantises colour space while maintaining machine recognition on the quantised images.
We observe the consistent evolution pattern between our artificial colour system and basic colour terms across human languages.
Our colour quantisation method also offers an efficient quantisation method that effectively compresses the image storage.
arXiv Detail & Related papers (2022-12-07T03:39:18Z) - Siamese based Neural Network for Offline Writer Identification on word
level data [7.747239584541488]
We propose a novel scheme to identify the author of a document based on the input word image.
Our method is text independent and does not impose any constraint on the size of the input image under examination.
arXiv Detail & Related papers (2022-11-17T10:01:46Z) - Keypoint Message Passing for Video-based Person Re-Identification [106.41022426556776]
Video-based person re-identification (re-ID) is an important technique in visual surveillance systems which aims to match video snippets of people captured by different cameras.
Existing methods are mostly based on convolutional neural networks (CNNs), whose building blocks either process local neighbor pixels at a time, or, when 3D convolutions are used to model temporal information, suffer from the misalignment problem caused by person movement.
In this paper, we propose to overcome the limitations of normal convolutions with a human-oriented graph method. Specifically, features located at person joint keypoints are extracted and connected as a spatial-temporal graph
arXiv Detail & Related papers (2021-11-16T08:01:16Z) - Towards an IMU-based Pen Online Handwriting Recognizer [2.6707647984082357]
We present a online handwriting recognition system for word recognition based on inertial measurement units (IMUs)
This is obtained by means of a sensor-equipped pen that provides acceleration, angular velocity, and magnetic forces streamed via Bluetooth.
Our model combines convolutional and bidirectional LSTM networks, and is trained with the Connectionist Temporal Classification loss.
arXiv Detail & Related papers (2021-05-26T09:47:19Z) - Skeleton Based Sign Language Recognition Using Whole-body Keypoints [71.97020373520922]
Sign language is used by deaf or speech impaired people to communicate.
Skeleton-based recognition is becoming popular that it can be further ensembled with RGB-D based method to achieve state-of-the-art performance.
Inspired by the recent development of whole-body pose estimation citejin 2020whole, we propose recognizing sign language based on the whole-body key points and features.
arXiv Detail & Related papers (2021-03-16T03:38:17Z) - Self-supervised Video Representation Learning by Uncovering
Spatio-temporal Statistics [74.6968179473212]
This paper proposes a novel pretext task to address the self-supervised learning problem.
We compute a series of partitioning-temporal statistical summaries, such as the spatial location and dominant direction of the largest motion.
A neural network is built and trained to yield the statistical summaries given the video frames as inputs.
arXiv Detail & Related papers (2020-08-31T08:31:56Z) - Convolutional Neural Network Array for Sign Language Recognition using
Wearable IMUs [0.0]
The proposed work presents a novel one-dimensional Convolutional Neural Network (CNN) array architecture for recognition of signs from the Indian sign language.
The signals recorded using the IMU device are segregated on the basis of their context, such as whether they correspond to signing for a general sentence or an interrogative sentence.
arXiv Detail & Related papers (2020-04-21T23:11:04Z) - Transferring Cross-domain Knowledge for Video Sign Language Recognition [103.9216648495958]
Word-level sign language recognition (WSLR) is a fundamental task in sign language interpretation.
We propose a novel method that learns domain-invariant visual concepts and fertilizes WSLR models by transferring knowledge of subtitled news sign to them.
arXiv Detail & Related papers (2020-03-08T03:05:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.