FastHand: Fast Hand Pose Estimation From A Monocular Camera
- URL: http://arxiv.org/abs/2102.07067v1
- Date: Sun, 14 Feb 2021 04:12:41 GMT
- Title: FastHand: Fast Hand Pose Estimation From A Monocular Camera
- Authors: Shan An, Xiajie Zhang, Dong Wei, Haogang Zhu, Jianyu Yang, and
Konstantinos A. Tsintotas
- Abstract summary: We propose a fast and accurate framework for hand pose estimation, dubbed as "FastHand"
FastHand offers high accuracy scores while reaching a speed of 25 frames per second on an NVIDIA Jetson TX2 graphics processing unit.
- Score: 12.790733588554588
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hand gesture recognition constitutes the initial step in most methods related
to human-robot interaction. There are two key challenges in this task. The
first one corresponds to the difficulty of achieving stable and accurate hand
landmark predictions in real-world scenarios, while the second to the decreased
time of forward inference. In this paper, we propose a fast and accurate
framework for hand pose estimation, dubbed as "FastHand". Using a lightweight
encoder-decoder network architecture, we achieve to fulfil the requirements of
practical applications running on embedded devices. The encoder consists of
deep layers with a small number of parameters, while the decoder makes use of
spatial location information to obtain more accurate results. The evaluation
took place on two publicly available datasets demonstrating the improved
performance of the proposed pipeline compared to other state-of-the-art
approaches. FastHand offers high accuracy scores while reaching a speed of 25
frames per second on an NVIDIA Jetson TX2 graphics processing unit.
Related papers
- Combining Efficient and Precise Sign Language Recognition: Good pose
estimation library is all you need [2.9005223064604078]
Sign language recognition could significantly improve the user experience for d/Deaf people with general consumer technology.
Current sign language recognition architectures are usually computationally heavy and require robust GPU-equipped hardware to run in real-time.
We build upon the SPOTER architecture, which comes close to the performance of large models employed for this task.
arXiv Detail & Related papers (2022-09-30T17:30:32Z) - Hierarchical Temporal Transformer for 3D Hand Pose Estimation and Action
Recognition from Egocentric RGB Videos [50.74218823358754]
We develop a transformer-based framework to exploit temporal information for robust estimation.
We build a network hierarchy with two cascaded transformer encoders, where the first one exploits the short-term temporal cue for hand pose estimation.
Our approach achieves competitive results on two first-person hand action benchmarks, namely FPHA and H2O.
arXiv Detail & Related papers (2022-09-20T05:52:54Z) - SwiftLane: Towards Fast and Efficient Lane Detection [0.8972186395640678]
We propose SwiftLane: a light-weight, end-to-end deep learning based framework, coupled with the row-wise classification formulation for fast and efficient lane detection.
Our method achieves an inference speed of 411 frames per second, surpassing state-of-the-art in terms of speed while achieving comparable results in terms of accuracy on the popular CULane benchmark dataset.
arXiv Detail & Related papers (2021-10-22T13:35:05Z) - Real-Time Monocular Human Depth Estimation and Segmentation on Embedded
Systems [13.490605853268837]
Estimating a scene's depth to achieve collision avoidance against moving pedestrians is a crucial and fundamental problem in the robotic field.
This paper proposes a novel, low complexity network architecture for fast and accurate human depth estimation and segmentation in indoor environments.
arXiv Detail & Related papers (2021-08-24T03:26:08Z) - Real-time Pose and Shape Reconstruction of Two Interacting Hands With a
Single Depth Camera [79.41374930171469]
We present a novel method for real-time pose and shape reconstruction of two strongly interacting hands.
Our approach combines an extensive list of favorable properties, namely it is marker-less.
We show state-of-the-art results in scenes that exceed the complexity level demonstrated by previous work.
arXiv Detail & Related papers (2021-06-15T11:39:49Z) - Learning Spatio-Temporal Transformer for Visual Tracking [108.11680070733598]
We present a new tracking architecture with an encoder-decoder transformer as the key component.
The whole method is end-to-end, does not need any postprocessing steps such as cosine window and bounding box smoothing.
The proposed tracker achieves state-of-the-art performance on five challenging short-term and long-term benchmarks, while running real-time speed, being 6x faster than Siam R-CNN.
arXiv Detail & Related papers (2021-03-31T15:19:19Z) - Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with
Transformers [115.90778814368703]
Our objective is language-based search of large-scale image and video datasets.
For this task, the approach that consists of independently mapping text and vision to a joint embedding space, a.k.a. dual encoders, is attractive as retrieval scales.
An alternative approach of using vision-text transformers with cross-attention gives considerable improvements in accuracy over the joint embeddings.
arXiv Detail & Related papers (2021-03-30T17:57:08Z) - Real-time Semantic Segmentation with Fast Attention [94.88466483540692]
We propose a novel architecture for semantic segmentation of high-resolution images and videos in real-time.
The proposed architecture relies on our fast spatial attention, which is a simple yet efficient modification of the popular self-attention mechanism.
We show that results on multiple datasets demonstrate superior performance with better accuracy and speed compared to existing approaches.
arXiv Detail & Related papers (2020-07-07T22:37:16Z) - DeepMark++: Real-time Clothing Detection at the Edge [55.41644538483948]
We propose a single-stage approach to deliver rapid clothing detection and keypoint estimation.
Our solution is based on a multi-target network CenterNet, and we introduce several powerful post-processing techniques to enhance performance.
Our most accurate model achieves results comparable to state-of-the-art solutions on the DeepFashion2 dataset.
arXiv Detail & Related papers (2020-06-01T04:36:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.