Temporally Guided Articulated Hand Pose Tracking in Surgical Videos
- URL: http://arxiv.org/abs/2101.04281v1
- Date: Tue, 12 Jan 2021 03:44:04 GMT
- Title: Temporally Guided Articulated Hand Pose Tracking in Surgical Videos
- Authors: Nathan Louis, Luowei Zhou, Steven J. Yule, Roger D. Dias, Milisa
Manojlovich, Francis D. Pagani, Donald S. Likosky, Jason J. Corso
- Abstract summary: Articulated hand pose tracking is an underexplored problem that carries the potential for use in an extensive number of applications.
We propose a novel hand pose estimation model, Res152- CondPose, which improves tracking accuracy by incorporating a hand pose prior to its pose prediction.
Our dataset contains 76 video clips from 28 publicly available surgical videos and over 8.1k annotated hand pose instances.
- Score: 27.525545343598527
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Articulated hand pose tracking is an underexplored problem that carries the
potential for use in an extensive number of applications, especially in the
medical domain. With a robust and accurate tracking system on in-vivo surgical
videos, the motion dynamics and movement patterns of the hands can be captured
and analyzed for rich tasks including skills assessment, training surgical
residents, and temporal action recognition. In this work, we propose a novel
hand pose estimation model, Res152- CondPose, which improves tracking accuracy
by incorporating a hand pose prior into its pose prediction. We show
improvements over state-of-the-art methods which provide frame-wise independent
predictions, by following a temporally guided approach that effectively
leverages past predictions. Additionally, we collect the first dataset,
Surgical Hands, that provides multi-instance articulated hand pose annotations
for in-vivo videos. Our dataset contains 76 video clips from 28 publicly
available surgical videos and over 8.1k annotated hand pose instances. We
provide bounding boxes, articulated hand pose annotations, and tracking IDs to
enable multi-instance area-based and articulated tracking. When evaluated on
Surgical Hands, we show our method outperforms the state-of-the-art method
using mean Average Precision (mAP), to measure pose estimation accuracy, and
Multiple Object Tracking Accuracy (MOTA), to assess pose tracking performance.
Related papers
- HMP: Hand Motion Priors for Pose and Shape Estimation from Video [52.39020275278984]
We develop a generative motion prior specific for hands, trained on the AMASS dataset which features diverse and high-quality hand motions.
Our integration of a robust motion prior significantly enhances performance, especially in occluded scenarios.
We demonstrate our method's efficacy via qualitative and quantitative evaluations on the HO3D and DexYCB datasets.
arXiv Detail & Related papers (2023-12-27T22:35:33Z) - POV-Surgery: A Dataset for Egocentric Hand and Tool Pose Estimation
During Surgical Activities [4.989930168854209]
POV-Surgery is a large-scale, synthetic, egocentric dataset focusing on pose estimation for hands with different surgical gloves and three orthopedic surgical instruments.
Our dataset consists of 53 sequences and 88,329 frames, featuring high-resolution RGB-D video streams with activity annotations.
We fine-tune the current SOTA methods on POV-Surgery and further show the generalizability when applying to real-life cases with surgical gloves and tools.
arXiv Detail & Related papers (2023-07-19T18:00:32Z) - Next-generation Surgical Navigation: Marker-less Multi-view 6DoF Pose
Estimation of Surgical Instruments [66.74633676595889]
We present a multi-camera capture setup consisting of static and head-mounted cameras.
Second, we publish a multi-view RGB-D video dataset of ex-vivo spine surgeries, captured in a surgical wet lab and a real operating theatre.
Third, we evaluate three state-of-the-art single-view and multi-view methods for the task of 6DoF pose estimation of surgical instruments.
arXiv Detail & Related papers (2023-05-05T13:42:19Z) - ShaRPy: Shape Reconstruction and Hand Pose Estimation from RGB-D with
Uncertainty [6.559796851992517]
We propose ShaRPy, the first RGB-D Shape Reconstruction and hand Pose tracking system.
ShaRPy approximates a personalized hand shape, promoting a more realistic and intuitive understanding of its digital twin.
We evaluate ShaRPy on a keypoint detection benchmark and show qualitative results of hand function assessments for activity monitoring of musculoskeletal diseases.
arXiv Detail & Related papers (2023-03-17T15:12:25Z) - Using Hand Pose Estimation To Automate Open Surgery Training Feedback [0.0]
This research aims to facilitate the use of state-of-the-art computer vision algorithms for the automated training of surgeons.
By estimating 2D hand poses, we model the movement of the practitioner's hands, and their interaction with surgical instruments.
arXiv Detail & Related papers (2022-11-13T21:47:31Z) - Learning Dynamics via Graph Neural Networks for Human Pose Estimation
and Tracking [98.91894395941766]
We propose a novel online approach to learning the pose dynamics, which are independent of pose detections in current fame.
Specifically, we derive this prediction of dynamics through a graph neural network(GNN) that explicitly accounts for both spatial-temporal and visual information.
Experiments on PoseTrack 2017 and PoseTrack 2018 datasets demonstrate that the proposed method achieves results superior to the state of the art on both human pose estimation and tracking tasks.
arXiv Detail & Related papers (2021-06-07T16:36:50Z) - One-shot action recognition towards novel assistive therapies [63.23654147345168]
This work is motivated by the automated analysis of medical therapies that involve action imitation games.
The presented approach incorporates a pre-processing step that standardizes heterogeneous motion data conditions.
We evaluate the approach on a real use-case of automated video analysis for therapy support with autistic people.
arXiv Detail & Related papers (2021-02-17T19:41:37Z) - Using Computer Vision to Automate Hand Detection and Tracking of Surgeon
Movements in Videos of Open Surgery [8.095095522269352]
We leverage advances in computer vision to introduce an automated approach to video analysis of surgical execution.
A state-of-the-art convolutional neural network architecture for object detection was used to detect operating hands in open surgery videos.
Our model's spatial detections of operating hands significantly outperforms the detections achieved using pre-existing hand-detection datasets.
arXiv Detail & Related papers (2020-12-13T03:10:09Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - AutoTrajectory: Label-free Trajectory Extraction and Prediction from
Videos using Dynamic Points [92.91569287889203]
We present a novel, label-free algorithm, AutoTrajectory, for trajectory extraction and prediction.
To better capture the moving objects in videos, we introduce dynamic points.
We aggregate dynamic points to instance points, which stand for moving objects such as pedestrians in videos.
arXiv Detail & Related papers (2020-07-11T08:43:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.