POV-Surgery: A Dataset for Egocentric Hand and Tool Pose Estimation
During Surgical Activities
- URL: http://arxiv.org/abs/2307.10387v1
- Date: Wed, 19 Jul 2023 18:00:32 GMT
- Title: POV-Surgery: A Dataset for Egocentric Hand and Tool Pose Estimation
During Surgical Activities
- Authors: Rui Wang, Sophokles Ktistakis, Siwei Zhang, Mirko Meboldt, and Quentin
Lohmeyer
- Abstract summary: POV-Surgery is a large-scale, synthetic, egocentric dataset focusing on pose estimation for hands with different surgical gloves and three orthopedic surgical instruments.
Our dataset consists of 53 sequences and 88,329 frames, featuring high-resolution RGB-D video streams with activity annotations.
We fine-tune the current SOTA methods on POV-Surgery and further show the generalizability when applying to real-life cases with surgical gloves and tools.
- Score: 4.989930168854209
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The surgical usage of Mixed Reality (MR) has received growing attention in
areas such as surgical navigation systems, skill assessment, and robot-assisted
surgeries. For such applications, pose estimation for hand and surgical
instruments from an egocentric perspective is a fundamental task and has been
studied extensively in the computer vision field in recent years. However, the
development of this field has been impeded by a lack of datasets, especially in
the surgical field, where bloody gloves and reflective metallic tools make it
hard to obtain 3D pose annotations for hands and objects using conventional
methods. To address this issue, we propose POV-Surgery, a large-scale,
synthetic, egocentric dataset focusing on pose estimation for hands with
different surgical gloves and three orthopedic surgical instruments, namely
scalpel, friem, and diskplacer. Our dataset consists of 53 sequences and 88,329
frames, featuring high-resolution RGB-D video streams with activity
annotations, accurate 3D and 2D annotations for hand-object pose, and 2D
hand-object segmentation masks. We fine-tune the current SOTA methods on
POV-Surgery and further show the generalizability when applying to real-life
cases with surgical gloves and tools by extensive evaluations. The code and the
dataset are publicly available at batfacewayne.github.io/POV_Surgery_io/.
Related papers
- Realistic Data Generation for 6D Pose Estimation of Surgical Instruments [4.226502078427161]
6D pose estimation of surgical instruments is critical to enable the automatic execution of surgical maneuvers.
In household and industrial settings, synthetic data, generated with 3D computer graphics software, has been shown as an alternative to minimize annotation costs.
We propose an improved simulation environment for surgical robotics that enables the automatic generation of large and diverse datasets.
arXiv Detail & Related papers (2024-06-11T14:59:29Z) - EgoSurgery-Tool: A Dataset of Surgical Tool and Hand Detection from Egocentric Open Surgery Videos [8.134387035379879]
We introduce EgoSurgery-Tool, an extension of the EgoSurgery-Phase dataset, which contains real open surgery videos.
EgoSurgery-Tool comprises over 49K surgical tool bounding boxes across 15 categories, constituting a large-scale surgical tool detection dataset.
We conduct a comprehensive analysis of EgoSurgery-Tool using nine popular object detectors to assess their effectiveness in both surgical tool and hand detection.
arXiv Detail & Related papers (2024-06-05T09:36:15Z) - Creating a Digital Twin of Spinal Surgery: A Proof of Concept [68.37190859183663]
Surgery digitalization is the process of creating a virtual replica of real-world surgery.
We present a proof of concept (PoC) for surgery digitalization that is applied to an ex-vivo spinal surgery.
We employ five RGB-D cameras for dynamic 3D reconstruction of the surgeon, a high-end camera for 3D reconstruction of the anatomy, an infrared stereo camera for surgical instrument tracking, and a laser scanner for 3D reconstruction of the operating room and data fusion.
arXiv Detail & Related papers (2024-03-25T13:09:40Z) - CholecTrack20: A Dataset for Multi-Class Multiple Tool Tracking in
Laparoscopic Surgery [1.8076340162131013]
CholecTrack20 is an extensive dataset meticulously annotated for multi-class multi-tool tracking across three perspectives.
The dataset comprises 20 laparoscopic videos with over 35,000 frames and 65,000 annotated tool instances.
arXiv Detail & Related papers (2023-12-12T15:18:15Z) - Surgical tool classification and localization: results and methods from
the MICCAI 2022 SurgToolLoc challenge [69.91670788430162]
We present the results of the SurgLoc 2022 challenge.
The goal was to leverage tool presence data as weak labels for machine learning models trained to detect tools.
We conclude by discussing these results in the broader context of machine learning and surgical data science.
arXiv Detail & Related papers (2023-05-11T21:44:39Z) - Next-generation Surgical Navigation: Marker-less Multi-view 6DoF Pose
Estimation of Surgical Instruments [66.74633676595889]
We present a multi-camera capture setup consisting of static and head-mounted cameras.
Second, we publish a multi-view RGB-D video dataset of ex-vivo spine surgeries, captured in a surgical wet lab and a real operating theatre.
Third, we evaluate three state-of-the-art single-view and multi-view methods for the task of 6DoF pose estimation of surgical instruments.
arXiv Detail & Related papers (2023-05-05T13:42:19Z) - CholecTriplet2021: A benchmark challenge for surgical action triplet
recognition [66.51610049869393]
This paper presents CholecTriplet 2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos.
We present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge.
A total of 4 baseline methods and 19 new deep learning algorithms are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%.
arXiv Detail & Related papers (2022-04-10T18:51:55Z) - Using Human Gaze For Surgical Activity Recognition [0.40611352512781856]
We propose to use human gaze with a spatial temporal attention mechanism for activity recognition in surgical videos.
Our model consists of an I3D-based architecture, learns temporal features using 3D convolutions, as well as learning an attention map using human gaze.
arXiv Detail & Related papers (2022-03-09T14:28:00Z) - hSDB-instrument: Instrument Localization Database for Laparoscopic and
Robotic Surgeries [3.3340414770046856]
The hSDB-instrument dataset consists of instrument localization information from 24 cases of laparoscopic cholecystecomy and 24 cases of robotic gastrectomy.
To reflect the kinematic characteristics of all instruments, they are annotated with head and body parts for laparoscopic instruments, and with head, wrist, and body parts for robotic instruments separately.
arXiv Detail & Related papers (2021-10-24T23:35:37Z) - Towards unconstrained joint hand-object reconstruction from RGB videos [81.97694449736414]
Reconstructing hand-object manipulations holds a great potential for robotics and learning from human demonstrations.
We first propose a learning-free fitting approach for hand-object reconstruction which can seamlessly handle two-hand object interactions.
arXiv Detail & Related papers (2021-08-16T12:26:34Z) - Measuring Generalisation to Unseen Viewpoints, Articulations, Shapes and
Objects for 3D Hand Pose Estimation under Hand-Object Interaction [137.28465645405655]
HANDS'19 is a challenge to evaluate the abilities of current 3D hand pose estimators (HPEs) to interpolate and extrapolate the poses of a training set.
We show that the accuracy of state-of-the-art methods can drop, and that they fail mostly on poses absent from the training set.
arXiv Detail & Related papers (2020-03-30T19:28:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.