Related papers: HMD-EgoPose: Head-Mounted Display-Based Egocentric Marker-Less Tool and Hand Pose Estimation for Augmented Surgical Guidance

HMD-EgoPose: Head-Mounted Display-Based Egocentric Marker-Less Tool and Hand Pose Estimation for Augmented Surgical Guidance

URL: http://arxiv.org/abs/2202.11891v1
Date: Thu, 24 Feb 2022 04:07:34 GMT
Title: HMD-EgoPose: Head-Mounted Display-Based Egocentric Marker-Less Tool and Hand Pose Estimation for Augmented Surgical Guidance
Authors: Mitchell Doughty and Nilesh R. Ghugre
Abstract summary: We present HMD-EgoPose, a single-shot learning-based approach to hand and object pose estimation. We demonstrate state-of-the-art performance on a benchmark dataset for marker-less hand and surgical instrument pose tracking.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The success or failure of modern computer-assisted surgery procedures hinges on the precise six-degree-of-freedom (6DoF) position and orientation (pose) estimation of tracked instruments and tissue. In this paper, we present HMD-EgoPose, a single-shot learning-based approach to hand and object pose estimation and demonstrate state-of-the-art performance on a benchmark dataset for monocular red-green-blue (RGB) 6DoF marker-less hand and surgical instrument pose tracking. Further, we reveal the capacity of our HMD-EgoPose framework for 6DoF near real-time pose estimation on a commercially available optical see-through head-mounted display (OST-HMD) through a low-latency streaming approach. Our framework utilized an efficient convolutional neural network (CNN) backbone for multi-scale feature extraction and a set of subnetworks to jointly learn the 6DoF pose representation of the rigid surgical drill instrument and the grasping orientation of the hand of a user. To make our approach accessible to a commercially available OST-HMD, the Microsoft HoloLens 2, we created a pipeline for low-latency video and data communication with a high-performance computing workstation capable of optimized network inference. HMD-EgoPose outperformed current state-of-the-art approaches on a benchmark dataset for surgical tool pose estimation, achieving an average tool 3D vertex error of 11.0 mm on real data and furthering the progress towards a clinically viable marker-free tracking strategy. Through our low-latency streaming approach, we achieved a round trip latency of 202.5 ms for pose estimation and augmented visualization of the tracked model when integrated with the OST-HMD. Our single-shot learned approach was robust to occlusion and complex surfaces and improved on current state-of-the-art approaches to marker-less tool and hand pose estimation.

Related papers

Beyond the Desktop: XR-Driven Segmentation with Meta Quest 3 and MX Ink [1.7758191385707351]
This study implements and evaluates the usability and clinical applicability of an extended reality (XR)-based segmentation tool for anatomical CT scans.<n>We develop an immersive interface enabling real-time interaction with 2D and 3D medical imaging data in a customizable workspace.<n>A user study with a public craniofacial CT dataset demonstrated the tool's viability, achieving a System Usability Scale (SUS) score of 66, within the expected range for medical applications.
arXiv Detail & Related papers (2025-06-05T10:25:46Z)
A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning [67.72413262980272]
Pre-trained vision models (PVMs) are fundamental to modern robotics, yet their optimal configuration remains unclear. We develop SlotMIM, a method that induces object-centric representations by introducing a semantic bottleneck. Our approach achieves significant improvements over prior work in image recognition, scene understanding, and robot learning evaluations.
arXiv Detail & Related papers (2025-03-10T06:18:31Z)
SMPLest-X: Ultimate Scaling for Expressive Human Pose and Shape Estimation [81.36747103102459]
Expressive human pose and shape estimation (EHPS) unifies body, hands, and face motion capture with numerous applications. Current state-of-the-art methods focus on training innovative architectural designs on confined datasets. We investigate the impact of scaling up EHPS towards a family of generalist foundation models.
arXiv Detail & Related papers (2025-01-16T18:59:46Z)
ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction [89.89610257714006]
Existing methods prioritize higher accuracy to cater to the demands of these tasks. We introduce a series of targeted improvements for 3D semantic occupancy prediction and flow estimation. Our purelytemporalal architecture framework, named ALOcc, achieves an optimal tradeoff between speed and accuracy.
arXiv Detail & Related papers (2024-11-12T11:32:56Z)
WiLoR: End-to-end 3D Hand Localization and Reconstruction in-the-wild [53.288327629960364]
We present a data-driven pipeline for efficient multi-hand reconstruction in the wild. The proposed pipeline is composed of two components: a real-time fully convolutional hand localization and a high-fidelity transformer-based 3D hand reconstruction model. Our approach outperforms previous methods in both efficiency and accuracy on popular 2D and 3D benchmarks.
arXiv Detail & Related papers (2024-09-18T18:46:51Z)
Advancing 6-DoF Instrument Pose Estimation in Variable X-Ray Imaging Geometries [7.630289691590948]
We propose a general-purpose approach of data acquisition for 6-DoF pose estimation tasks in X-ray systems. The proposed YOLOv5-6D pose model achieves competitive results on public benchmarks whilst being considerably faster at 42 FPS on GPU. The model achieves a 92.41% by the 0.1 ADD-S metric, demonstrating a promising approach for enhancing surgical precision and patient outcomes.
arXiv Detail & Related papers (2024-05-19T21:35:12Z)
In My Perspective, In My Hands: Accurate Egocentric 2D Hand Pose and Action Recognition [1.4732811715354455]
Action recognition is essential for egocentric video understanding, allowing automatic and continuous monitoring of Activities of Daily Living (ADLs) without user effort. Existing literature focuses on 3D hand pose input, which requires computationally intensive depth estimation networks or wearing an uncomfortable depth sensor. We introduce two novel approaches for 2D hand pose estimation, namely EffHandNet for single-hand estimation and EffHandEgoNet, tailored for an egocentric perspective.
arXiv Detail & Related papers (2024-04-14T17:33:33Z)
Realistic Full-Body Tracking from Sparse Observations via Joint-Level Modeling [13.284947022380404]
We propose a two-stage framework that can obtain accurate and smooth full-body motions with three tracking signals of head and hands only. Our framework explicitly models the joint-level features in the first stage and utilizes them astemporal tokens for alternating spatial and temporal transformer blocks to capture joint-level correlations in the second stage. With extensive experiments on the AMASS motion dataset and real-captured data, we show our proposed method can achieve more accurate and smooth motion compared to existing approaches.
arXiv Detail & Related papers (2023-08-17T08:27:55Z)
Learned Vertex Descent: A New Direction for 3D Human Model Fitting [64.04726230507258]
We propose a novel optimization-based paradigm for 3D human model fitting on images and scans. Our approach is able to capture the underlying body of clothed people with very different body shapes, achieving a significant improvement compared to state-of-the-art. LVD is also applicable to 3D model fitting of humans and hands, for which we show a significant improvement to the SOTA with a much simpler and faster method.
arXiv Detail & Related papers (2022-05-12T17:55:51Z)
Occlusion-Aware Self-Supervised Monocular 6D Object Pose Estimation [88.8963330073454]
We propose a novel monocular 6D pose estimation approach by means of self-supervised learning. We leverage current trends in noisy student training and differentiable rendering to further self-supervise the model. Our proposed self-supervision outperforms all other methods relying on synthetic data.
arXiv Detail & Related papers (2022-03-19T15:12:06Z)
Occlusion-robust Visual Markerless Bone Tracking for Computer-Assisted Orthopaedic Surgery [41.681134859412246]
We propose a RGB-D sensing-based markerless tracking method that is robust against occlusion. By using a high-quality commercial RGB-D camera, our proposed visual tracking method achieves an accuracy of 1-2 degress and 2-4 mm on a model knee.
arXiv Detail & Related papers (2021-08-24T09:49:08Z)
SurgeonAssist-Net: Towards Context-Aware Head-Mounted Display-Based Augmented Reality for Surgical Guidance [18.060445966264727]
SurgeonAssist-Net is a framework making action-and-workflow-driven virtual assistance accessible to commercially available optical see-through head-mounted displays (OST-HMDs) Our implementation competes with state-of-the-art approaches in prediction accuracy for automated task recognition. It is capable of near real-time performance on the Microsoft HoloLens 2 OST-HMD.
arXiv Detail & Related papers (2021-07-13T21:12:34Z)
Learnable Online Graph Representations for 3D Multi-Object Tracking [156.58876381318402]
We propose a unified and learning based approach to the 3D MOT problem. We employ a Neural Message Passing network for data association that is fully trainable. We show the merit of the proposed approach on the publicly available nuScenes dataset by achieving state-of-the-art performance of 65.6% AMOTA and 58% fewer ID-switches.
arXiv Detail & Related papers (2021-04-23T17:59:28Z)
Volumetric Attention for 3D Medical Image Segmentation and Detection [53.041572035020344]
A volumetric attention(VA) module for 3D medical image segmentation and detection is proposed. VA attention is inspired by recent advances in video processing, enables 2.5D networks to leverage context information along the z direction. Its integration in the Mask R-CNN is shown to enable state-of-the-art performance on the Liver Tumor (LiTS) Challenge.
arXiv Detail & Related papers (2020-04-04T18:55:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.