EgoSurgery-HTS: A Dataset for Egocentric Hand-Tool Segmentation in Open Surgery Videos
- URL: http://arxiv.org/abs/2503.18755v1
- Date: Mon, 24 Mar 2025 15:04:32 GMT
- Title: EgoSurgery-HTS: A Dataset for Egocentric Hand-Tool Segmentation in Open Surgery Videos
- Authors: Nathan Darjana, Ryo Fujii, Hideo Saito, Hiroki Kajita,
- Abstract summary: EgoSurgery-HTS is a new dataset with pixel-wise annotations and a benchmark suite for segmenting surgical tools, hands, and interacting tools in egocentric open-surgery videos.<n>We conduct extensive evaluations of state-of-the-art segmentation methods and demonstrate significant improvements in the accuracy of hand and hand-tool segmentation in egocentric open-surgery videos.
- Score: 7.446152826866544
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Egocentric open-surgery videos capture rich, fine-grained details essential for accurately modeling surgical procedures and human behavior in the operating room. A detailed, pixel-level understanding of hands and surgical tools is crucial for interpreting a surgeon's actions and intentions. We introduce EgoSurgery-HTS, a new dataset with pixel-wise annotations and a benchmark suite for segmenting surgical tools, hands, and interacting tools in egocentric open-surgery videos. Specifically, we provide a labeled dataset for (1) tool instance segmentation of 14 distinct surgical tools, (2) hand instance segmentation, and (3) hand-tool segmentation to label hands and the tools they manipulate. Using EgoSurgery-HTS, we conduct extensive evaluations of state-of-the-art segmentation methods and demonstrate significant improvements in the accuracy of hand and hand-tool segmentation in egocentric open-surgery videos compared to existing datasets. The dataset will be released at https://github.com/Fujiry0/EgoSurgery.
Related papers
- Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning [71.02843679746563]
In egocentric video understanding, the motion of hands and objects as well as their interactions play a significant role by nature.
In this work, we aim to integrate the modeling of fine-grained hand-object dynamics into the video representation learning process.
We propose EgoVideo, a model with a new lightweight motion adapter to capture fine-grained hand-object motion information.
arXiv Detail & Related papers (2025-03-02T18:49:48Z) - EgoSurgery-Tool: A Dataset of Surgical Tool and Hand Detection from Egocentric Open Surgery Videos [8.134387035379879]
We introduce EgoSurgery-Tool, an extension of the EgoSurgery-Phase dataset.<n>EgoSurgery-Tool comprises over 49K surgical tool bounding boxes across 15 categories, constituting a large-scale surgical tool detection dataset.<n>We conduct a comprehensive analysis of EgoSurgery-Tool using nine popular object detectors to assess their effectiveness in both surgical tool and hand detection.
arXiv Detail & Related papers (2024-06-05T09:36:15Z) - SAR-RARP50: Segmentation of surgical instrumentation and Action
Recognition on Robot-Assisted Radical Prostatectomy Challenge [72.97934765570069]
We release the first multimodal, publicly available, in-vivo, dataset for surgical action recognition and semantic instrumentation segmentation, containing 50 suturing video segments of Robotic Assisted Radical Prostatectomy (RARP)
The aim of the challenge is to enable researchers to leverage the scale of the provided dataset and develop robust and highly accurate single-task action recognition and tool segmentation approaches in the surgical domain.
A total of 12 teams participated in the challenge, contributing 7 action recognition methods, 9 instrument segmentation techniques, and 4 multitask approaches that integrated both action recognition and instrument segmentation.
arXiv Detail & Related papers (2023-12-31T13:32:18Z) - Video-Instrument Synergistic Network for Referring Video Instrument
Segmentation in Robotic Surgery [29.72271827272853]
This work explores a new task of Referring Surgical Video Instrument (RSVIS)
It aims to automatically identify and segment the corresponding surgical instruments based on the given language expression.
We devise a novel Video-Instrument Synergistic Network (VIS-Net) to learn both video-level and instrument-level knowledge to boost performance.
arXiv Detail & Related papers (2023-08-18T11:24:06Z) - Hierarchical Semi-Supervised Learning Framework for Surgical Gesture
Segmentation and Recognition Based on Multi-Modality Data [2.8770761243361593]
We develop a hierarchical semi-supervised learning framework for surgical gesture segmentation using multi-modality data.
A Transformer-based network with a pre-trained ResNet-18' backbone is used to extract visual features from the surgical operation videos.
The proposed approach has been evaluated using data from the publicly available JIGS database, including Suturing, Needle Passing, and Knot Tying tasks.
arXiv Detail & Related papers (2023-07-31T21:17:59Z) - POV-Surgery: A Dataset for Egocentric Hand and Tool Pose Estimation
During Surgical Activities [4.989930168854209]
POV-Surgery is a large-scale, synthetic, egocentric dataset focusing on pose estimation for hands with different surgical gloves and three orthopedic surgical instruments.
Our dataset consists of 53 sequences and 88,329 frames, featuring high-resolution RGB-D video streams with activity annotations.
We fine-tune the current SOTA methods on POV-Surgery and further show the generalizability when applying to real-life cases with surgical gloves and tools.
arXiv Detail & Related papers (2023-07-19T18:00:32Z) - Intuitive Surgical SurgToolLoc Challenge Results: 2022-2023 [55.40111320730479]
We have challenged the surgical data science community to solve difficult machine learning problems in the context of advanced RA applications.<n>Here we document the results of these challenges, focusing on surgical tool localization (SurgToolLoc)<n>The publicly released dataset that accompanies these challenges is detailed in a separate paper arXiv:2501.09209.
arXiv Detail & Related papers (2023-05-11T21:44:39Z) - Self-Supervised Correction Learning for Semi-Supervised Biomedical Image
Segmentation [84.58210297703714]
We propose a self-supervised correction learning paradigm for semi-supervised biomedical image segmentation.
We design a dual-task network, including a shared encoder and two independent decoders for segmentation and lesion region inpainting.
Experiments on three medical image segmentation datasets for different tasks demonstrate the outstanding performance of our method.
arXiv Detail & Related papers (2023-01-12T08:19:46Z) - Fine-Grained Egocentric Hand-Object Segmentation: Dataset, Model, and
Applications [20.571026014771828]
We provide a labeled dataset consisting of 11,243 egocentric images with per-pixel segmentation labels of hands and objects being interacted with.
Our dataset is the first to label detailed hand-object contact boundaries.
We show that our robust hand-object segmentation model and dataset can serve as a foundational tool to boost or enable several downstream vision applications.
arXiv Detail & Related papers (2022-08-07T21:43:40Z) - Pseudo-label Guided Cross-video Pixel Contrast for Robotic Surgical
Scene Segmentation with Limited Annotations [72.15956198507281]
We propose PGV-CL, a novel pseudo-label guided cross-video contrast learning method to boost scene segmentation.
We extensively evaluate our method on a public robotic surgery dataset EndoVis18 and a public cataract dataset CaDIS.
arXiv Detail & Related papers (2022-07-20T05:42:19Z) - Co-Generation and Segmentation for Generalized Surgical Instrument
Segmentation on Unlabelled Data [49.419268399590045]
Surgical instrument segmentation for robot-assisted surgery is needed for accurate instrument tracking and augmented reality overlays.
Deep learning-based methods have shown state-of-the-art performance for surgical instrument segmentation, but their results depend on labelled data.
In this paper, we demonstrate the limited generalizability of these methods on different datasets, including human robot-assisted surgeries.
arXiv Detail & Related papers (2021-03-16T18:41:18Z) - Towards Unsupervised Learning for Instrument Segmentation in Robotic
Surgery with Cycle-Consistent Adversarial Networks [54.00217496410142]
We propose an unpaired image-to-image translation where the goal is to learn the mapping between an input endoscopic image and a corresponding annotation.
Our approach allows to train image segmentation models without the need to acquire expensive annotations.
We test our proposed method on Endovis 2017 challenge dataset and show that it is competitive with supervised segmentation methods.
arXiv Detail & Related papers (2020-07-09T01:39:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.