Related papers: Anchors Based Method for Fingertips Position Estimation from a Monocular RGB Image using Deep Neural Network

Anchors Based Method for Fingertips Position Estimation from a Monocular RGB Image using Deep Neural Network

URL: http://arxiv.org/abs/2005.01351v2
Date: Thu, 14 May 2020 06:57:58 GMT
Title: Anchors Based Method for Fingertips Position Estimation from a Monocular RGB Image using Deep Neural Network
Authors: Purnendu Mishra and Kishor Sarawadekar
Abstract summary: In this paper, we propose a deep neural network based methodology to estimate the fingertips position. The proposed framework performs the best with limited dependence on hand detection results. In experiments on the SCUT-Ego-Gesture dataset, we achieved the fingertips detection error of 2.3552 pixels on a video frame with a resolution of $640 times 480$.
Score: 2.4366811507669124
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In Virtual, augmented, and mixed reality, the use of hand gestures is increasingly becoming popular to reduce the difference between the virtual and real world. The precise location of the fingertip is essential/crucial for a seamless experience. Much of the research work is based on using depth information for the estimation of the fingertips position. However, most of the work using RGB images for fingertips detection is limited to a single finger. The detection of multiple fingertips from a single RGB image is very challenging due to various factors. In this paper, we propose a deep neural network (DNN) based methodology to estimate the fingertips position. We christened this methodology as an Anchor based Fingertips Position Estimation (ABFPE), and it is a two-step process. The fingertips location is estimated using regression by computing the difference in the location of a fingertip from the nearest anchor point. The proposed framework performs the best with limited dependence on hand detection results. In our experiments on the SCUT-Ego-Gesture dataset, we achieved the fingertips detection error of 2.3552 pixels on a video frame with a resolution of $640 \times 480$ and about $92.98\%$ of test images have average pixel errors of five pixels.

Related papers

Learning to Make Keypoints Sub-Pixel Accurate [80.55676599677824]
This work addresses the challenge of sub-pixel accuracy in detecting 2D local features. We propose a novel network that enhances any detector with sub-pixel precision by learning an offset vector for detected features.
arXiv Detail & Related papers (2024-07-16T12:39:56Z)
Neural feels with neural fields: Visuo-tactile perception for in-hand manipulation [57.60490773016364]
We combine vision and touch sensing on a multi-fingered hand to estimate an object's pose and shape during in-hand manipulation. Our method, NeuralFeels, encodes object geometry by learning a neural field online and jointly tracks it by optimizing a pose graph problem. Our results demonstrate that touch, at the very least, refines and, at the very best, disambiguates visual estimates during in-hand manipulation.
arXiv Detail & Related papers (2023-12-20T22:36:37Z)
Exploring Deep Learning Image Super-Resolution for Iris Recognition [50.43429968821899]
We propose the use of two deep learning single-image super-resolution approaches: Stacked Auto-Encoders (SAE) and Convolutional Neural Networks (CNN) We validate the methods with a database of 1.872 near-infrared iris images with quality assessment and recognition experiments showing the superiority of deep learning approaches over the compared algorithms.
arXiv Detail & Related papers (2023-11-02T13:57:48Z)
Deep Convolutional Pooling Transformer for Deepfake Detection [54.10864860009834]
We propose a deep convolutional Transformer to incorporate decisive image features both locally and globally. Specifically, we apply convolutional pooling and re-attention to enrich the extracted features and enhance efficacy. The proposed solution consistently outperforms several state-of-the-art baselines on both within- and cross-dataset experiments.
arXiv Detail & Related papers (2022-09-12T15:05:41Z)
SPSN: Superpixel Prototype Sampling Network for RGB-D Salient Object Detection [5.2134203335146925]
RGB-D salient object detection (SOD) has been in the spotlight recently because it is an important preprocessing operation for various vision tasks. Despite advances in deep learning-based methods, RGB-D SOD is still challenging due to the large domain gap between an RGB image and the depth map and low-quality depth maps. We propose a novel superpixel prototype sampling network architecture to solve this problem.
arXiv Detail & Related papers (2022-07-16T10:43:14Z)
Learning Weighting Map for Bit-Depth Expansion within a Rational Range [64.15915577164894]
Bit-depth expansion (BDE) is one of the emerging technologies to display high bit-depth (HBD) image from low bit-depth (LBD) source. Existing BDE methods have no unified solution for various BDE situations. We design a bit restoration network (BRNet) to learn a weight for each pixel, which indicates the ratio of the replenished value within a rational range.
arXiv Detail & Related papers (2022-04-26T02:27:39Z)
Single image deep defocus estimation and its applications [82.93345261434943]
We train a deep neural network to classify image patches into one of the 20 levels of blurriness. The trained model is used to determine the patch blurriness which is then refined by applying an iterative weighted guided filter. The result is a defocus map that carries the information of the degree of blurriness for each pixel.
arXiv Detail & Related papers (2021-07-30T06:18:16Z)
A deep-learning--based multimodal depth-aware dynamic hand gesture recognition system [5.458813674116228]
We focus on dynamic hand gesture (DHG) recognition using depth quantized image hand skeleton joint points. In particular, we explore the effect of using depth-quantized features in CNN and Recurrent Neural Network (RNN) based multi-modal fusion networks.
arXiv Detail & Related papers (2021-07-06T11:18:53Z)
RGB Matters: Learning 7-DoF Grasp Poses on Monocular RGBD Images [42.68340286459079]
General object grasping is an important yet unsolved problem in the field of robotics. We propose RGBD-Grasp, a pipeline that solves this problem by decoupling 7-DoF grasp detection into two sub-tasks. We achieve state-of-the-art results on GraspNet-1Billion dataset.
arXiv Detail & Related papers (2021-03-03T05:12:20Z)
A Unified Learning Approach for Hand Gesture Recognition and Fingertip Detection [3.145455301228176]
The proposed algorithm uses a single network to predict the probabilities of finger class and positions of fingertips. The proposed method results in remarkably less pixel error as compared to that in the direct regression approach.
arXiv Detail & Related papers (2021-01-06T14:05:13Z)
Towards Dense People Detection with Deep Learning and Depth images [9.376814409561726]
This paper proposes a DNN-based system that detects multiple people from a single depth image. Our neural network processes a depth image and outputs a likelihood map in image coordinates. We show this strategy to be effective, producing networks that generalize to work with scenes different from those used during training.
arXiv Detail & Related papers (2020-07-14T16:43:02Z)
Single Image Depth Estimation Trained via Depth from Defocus Cues [105.67073923825842]
Estimating depth from a single RGB image is a fundamental task in computer vision. In this work, we rely, instead of different views, on depth from focus cues. We present results that are on par with supervised methods on KITTI and Make3D datasets and outperform unsupervised learning approaches.
arXiv Detail & Related papers (2020-01-14T20:22:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.