HMD-EgoPose: Head-Mounted Display-Based Egocentric Marker-Less Tool and
Hand Pose Estimation for Augmented Surgical Guidance
- URL: http://arxiv.org/abs/2202.11891v1
- Date: Thu, 24 Feb 2022 04:07:34 GMT
- Title: HMD-EgoPose: Head-Mounted Display-Based Egocentric Marker-Less Tool and
Hand Pose Estimation for Augmented Surgical Guidance
- Authors: Mitchell Doughty and Nilesh R. Ghugre
- Abstract summary: We present HMD-EgoPose, a single-shot learning-based approach to hand and object pose estimation.
We demonstrate state-of-the-art performance on a benchmark dataset for marker-less hand and surgical instrument pose tracking.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The success or failure of modern computer-assisted surgery procedures hinges
on the precise six-degree-of-freedom (6DoF) position and orientation (pose)
estimation of tracked instruments and tissue. In this paper, we present
HMD-EgoPose, a single-shot learning-based approach to hand and object pose
estimation and demonstrate state-of-the-art performance on a benchmark dataset
for monocular red-green-blue (RGB) 6DoF marker-less hand and surgical
instrument pose tracking. Further, we reveal the capacity of our HMD-EgoPose
framework for 6DoF near real-time pose estimation on a commercially available
optical see-through head-mounted display (OST-HMD) through a low-latency
streaming approach. Our framework utilized an efficient convolutional neural
network (CNN) backbone for multi-scale feature extraction and a set of
subnetworks to jointly learn the 6DoF pose representation of the rigid surgical
drill instrument and the grasping orientation of the hand of a user. To make
our approach accessible to a commercially available OST-HMD, the Microsoft
HoloLens 2, we created a pipeline for low-latency video and data communication
with a high-performance computing workstation capable of optimized network
inference. HMD-EgoPose outperformed current state-of-the-art approaches on a
benchmark dataset for surgical tool pose estimation, achieving an average tool
3D vertex error of 11.0 mm on real data and furthering the progress towards a
clinically viable marker-free tracking strategy. Through our low-latency
streaming approach, we achieved a round trip latency of 202.5 ms for pose
estimation and augmented visualization of the tracked model when integrated
with the OST-HMD. Our single-shot learned approach was robust to occlusion and
complex surfaces and improved on current state-of-the-art approaches to
marker-less tool and hand pose estimation.
Related papers
- WiLoR: End-to-end 3D Hand Localization and Reconstruction in-the-wild [53.288327629960364]
We present a data-driven pipeline for efficient multi-hand reconstruction in the wild.
The proposed pipeline is composed of two components: a real-time fully convolutional hand localization and a high-fidelity transformer-based 3D hand reconstruction model.
Our approach outperforms previous methods in both efficiency and accuracy on popular 2D and 3D benchmarks.
arXiv Detail & Related papers (2024-09-18T18:46:51Z) - Advancing 6-DoF Instrument Pose Estimation in Variable X-Ray Imaging Geometries [7.630289691590948]
We propose a general-purpose approach of data acquisition for 6-DoF pose estimation tasks in X-ray systems.
The proposed YOLOv5-6D pose model achieves competitive results on public benchmarks whilst being considerably faster at 42 FPS on GPU.
The model achieves a 92.41% by the 0.1 ADD-S metric, demonstrating a promising approach for enhancing surgical precision and patient outcomes.
arXiv Detail & Related papers (2024-05-19T21:35:12Z) - In My Perspective, In My Hands: Accurate Egocentric 2D Hand Pose and Action Recognition [1.4732811715354455]
Action recognition is essential for egocentric video understanding, allowing automatic and continuous monitoring of Activities of Daily Living (ADLs) without user effort.
Existing literature focuses on 3D hand pose input, which requires computationally intensive depth estimation networks or wearing an uncomfortable depth sensor.
We introduce two novel approaches for 2D hand pose estimation, namely EffHandNet for single-hand estimation and EffHandEgoNet, tailored for an egocentric perspective.
arXiv Detail & Related papers (2024-04-14T17:33:33Z) - Realistic Full-Body Tracking from Sparse Observations via Joint-Level
Modeling [13.284947022380404]
We propose a two-stage framework that can obtain accurate and smooth full-body motions with three tracking signals of head and hands only.
Our framework explicitly models the joint-level features in the first stage and utilizes them astemporal tokens for alternating spatial and temporal transformer blocks to capture joint-level correlations in the second stage.
With extensive experiments on the AMASS motion dataset and real-captured data, we show our proposed method can achieve more accurate and smooth motion compared to existing approaches.
arXiv Detail & Related papers (2023-08-17T08:27:55Z) - Learned Vertex Descent: A New Direction for 3D Human Model Fitting [64.04726230507258]
We propose a novel optimization-based paradigm for 3D human model fitting on images and scans.
Our approach is able to capture the underlying body of clothed people with very different body shapes, achieving a significant improvement compared to state-of-the-art.
LVD is also applicable to 3D model fitting of humans and hands, for which we show a significant improvement to the SOTA with a much simpler and faster method.
arXiv Detail & Related papers (2022-05-12T17:55:51Z) - Semantic keypoint-based pose estimation from single RGB frames [64.80395521735463]
We present an approach to estimating the continuous 6-DoF pose of an object from a single RGB image.
The approach combines semantic keypoints predicted by a convolutional network (convnet) with a deformable shape model.
We show that our approach can accurately recover the 6-DoF object pose for both instance- and class-based scenarios.
arXiv Detail & Related papers (2022-04-12T15:03:51Z) - Occlusion-Aware Self-Supervised Monocular 6D Object Pose Estimation [88.8963330073454]
We propose a novel monocular 6D pose estimation approach by means of self-supervised learning.
We leverage current trends in noisy student training and differentiable rendering to further self-supervise the model.
Our proposed self-supervision outperforms all other methods relying on synthetic data.
arXiv Detail & Related papers (2022-03-19T15:12:06Z) - Occlusion-robust Visual Markerless Bone Tracking for Computer-Assisted
Orthopaedic Surgery [41.681134859412246]
We propose a RGB-D sensing-based markerless tracking method that is robust against occlusion.
By using a high-quality commercial RGB-D camera, our proposed visual tracking method achieves an accuracy of 1-2 degress and 2-4 mm on a model knee.
arXiv Detail & Related papers (2021-08-24T09:49:08Z) - SurgeonAssist-Net: Towards Context-Aware Head-Mounted Display-Based
Augmented Reality for Surgical Guidance [18.060445966264727]
SurgeonAssist-Net is a framework making action-and-workflow-driven virtual assistance accessible to commercially available optical see-through head-mounted displays (OST-HMDs)
Our implementation competes with state-of-the-art approaches in prediction accuracy for automated task recognition.
It is capable of near real-time performance on the Microsoft HoloLens 2 OST-HMD.
arXiv Detail & Related papers (2021-07-13T21:12:34Z) - Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem.
We employ a Neural Message Passing network for data association that is fully trainable.
We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z) - Volumetric Attention for 3D Medical Image Segmentation and Detection [53.041572035020344]
A volumetric attention(VA) module for 3D medical image segmentation and detection is proposed.
VA attention is inspired by recent advances in video processing, enables 2.5D networks to leverage context information along the z direction.
Its integration in the Mask R-CNN is shown to enable state-of-the-art performance on the Liver Tumor (LiTS) Challenge.
arXiv Detail & Related papers (2020-04-04T18:55:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.