Related papers: Video Dataset for Surgical Phase, Keypoint, and Instrument Recognition in Laparoscopic Surgery (PhaKIR)

Video Dataset for Surgical Phase, Keypoint, and Instrument Recognition in Laparoscopic Surgery (PhaKIR)

URL: http://arxiv.org/abs/2511.06549v1
Date: Sun, 09 Nov 2025 21:29:33 GMT
Title: Video Dataset for Surgical Phase, Keypoint, and Instrument Recognition in Laparoscopic Surgery (PhaKIR)
Authors: Tobias Rueckert, Raphaela Maerkl, David Rauber, Leonard Klausmann, Max Gutbrod, Daniel Rueckert, Hubertus Feussner, Dirk Wilhelm, Christoph Palm,
Abstract summary: We present the Surgical Procedure Phase, Keypoint, and Instrument Recognition (PhaKIR) dataset.<n>PhaKIR is the first multi-institutional dataset to jointly provide phase labels, instrument pose information, and pixel-accurate instrument segmentations.<n>The dataset is publicly available upon request via the Zenodo platform.
Score: 17.067466198535246
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Robotic- and computer-assisted minimally invasive surgery (RAMIS) is increasingly relying on computer vision methods for reliable instrument recognition and surgical workflow understanding. Developing such systems often requires large, well-annotated datasets, but existing resources often address isolated tasks, neglect temporal dependencies, or lack multi-center variability. We present the Surgical Procedure Phase, Keypoint, and Instrument Recognition (PhaKIR) dataset, comprising eight complete laparoscopic cholecystectomy videos recorded at three medical centers. The dataset provides frame-level annotations for three interconnected tasks: surgical phase recognition (485,875 frames), instrument keypoint estimation (19,435 frames), and instrument instance segmentation (19,435 frames). PhaKIR is, to our knowledge, the first multi-institutional dataset to jointly provide phase labels, instrument pose information, and pixel-accurate instrument segmentations, while also enabling the exploitation of temporal context since full surgical procedure sequences are available. It served as the basis for the PhaKIR Challenge as part of the Endoscopic Vision (EndoVis) Challenge at MICCAI 2024 to benchmark methods in surgical scene understanding, thereby further validating the dataset's quality and relevance. The dataset is publicly available upon request via the Zenodo platform.

Related papers

Comparative validation of surgical phase recognition, instrument keypoint estimation, and instrument instance segmentation in endoscopy: Results of the PhaKIR 2024 challenge [27.48982385201173]
We introduce a novel dataset comprising thirteen full-length laparoscopic cholecystectomy videos collected from three medical institutions.<n>Unlike existing datasets, ours enables joint investigation of instrument localization and procedural context within the same data.<n>We report results and findings in accordance with the BIAS guidelines for biomedical image analysis challenges.
arXiv Detail & Related papers (2025-07-22T13:10:42Z)
ProstaTD: Bridging Surgical Triplet from Classification to Fully Supervised Detection [54.270188252068145]
ProstaTD is a large-scale dataset for surgical triplet detection developed from the technically demanding domain of robot-assisted prostatectomy.<n>The dataset comprises 71,775 video frames and 196,490 annotated triplet instances, collected from 21 surgeries performed across multiple institutions.<n>ProstaTD is the largest and most diverse surgical triplet dataset to date, moving the field from simple classification to full detection with precise spatial and temporal boundaries.
arXiv Detail & Related papers (2025-06-01T19:29:39Z)
TEMSET-24K: Densely Annotated Dataset for Indexing Multipart Endoscopic Videos using Surgical Timeline Segmentation [2.9776992449863613]
Current video analytics rely on manual indexing, a time-consuming process.<n>We present TEMSET-24K, an open-source dataset comprising 24,306 trans-anal endoscopic microsurgery (TEMS) video microclips.<n>Each clip is meticulously annotated by clinical experts using a novel hierarchical labeling taxonomy.
arXiv Detail & Related papers (2025-02-10T17:37:34Z)
CathFlow: Self-Supervised Segmentation of Catheters in Interventional Ultrasound Using Optical Flow and Transformers [66.15847237150909]
We introduce a self-supervised deep learning architecture to segment catheters in longitudinal ultrasound images. The network architecture builds upon AiAReSeg, a segmentation transformer built with the Attention in Attention mechanism. We validated our model on a test dataset, consisting of unseen synthetic data and images collected from silicon aorta phantoms.
arXiv Detail & Related papers (2024-03-21T15:13:36Z)
Pixel-Wise Recognition for Holistic Surgical Scene Understanding [33.40319680006502]
This paper presents the Holistic and Multi-Granular Surgical Scene Understanding of Prostatectomies dataset.<n>Our benchmark models surgical scene understanding as a hierarchy of complementary tasks with varying levels of granularity.<n>To exploit our proposed benchmark, we introduce the Transformers for Actions, Phases, Steps, and Instrument (TAPIS) model.
arXiv Detail & Related papers (2024-01-20T09:09:52Z)
SAR-RARP50: Segmentation of surgical instrumentation and Action Recognition on Robot-Assisted Radical Prostatectomy Challenge [72.97934765570069]
We release the first multimodal, publicly available, in-vivo, dataset for surgical action recognition and semantic instrumentation segmentation, containing 50 suturing video segments of Robotic Assisted Radical Prostatectomy (RARP) The aim of the challenge is to enable researchers to leverage the scale of the provided dataset and develop robust and highly accurate single-task action recognition and tool segmentation approaches in the surgical domain. A total of 12 teams participated in the challenge, contributing 7 action recognition methods, 9 instrument segmentation techniques, and 4 multitask approaches that integrated both action recognition and instrument segmentation.
arXiv Detail & Related papers (2023-12-31T13:32:18Z)
CholecTrack20: A Multi-Perspective Tracking Dataset for Surgical Tools [1.7059333957102913]
Existing datasets rely on overly generic tracking formalizations that fail to capture surgical-specific dynamics.<n>We introduce CholecTrack20, a specialized dataset for multi-class, multi-tool tracking in surgical procedures.<n>The dataset comprises 20 full-length surgical videos, annotated at 1 fps, yielding over 35K frames and 65K labeled tool instances.
arXiv Detail & Related papers (2023-12-12T15:18:15Z)
CholecTriplet2021: A benchmark challenge for surgical action triplet recognition [66.51610049869393]
This paper presents CholecTriplet 2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos. We present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge. A total of 4 baseline methods and 19 new deep learning algorithms are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%.
arXiv Detail & Related papers (2022-04-10T18:51:55Z)
Federated Cycling (FedCy): Semi-supervised Federated Learning of Surgical Phases [57.90226879210227]
FedCy is a semi-supervised learning (FSSL) method that combines FL and self-supervised learning to exploit a decentralized dataset of both labeled and unlabeled videos. We demonstrate significant performance gains over state-of-the-art FSSL methods on the task of automatic recognition of surgical phases.
arXiv Detail & Related papers (2022-03-14T17:44:53Z)
Heidelberg Colorectal Data Set for Surgical Data Science in the Sensor Operating Room [1.6276355161958829]
This paper introduces the Heidelberg Colorectal (HeiCo) data set - the first publicly available data set enabling comprehensive benchmarking of medical instrument detection and segmentation algorithms. Our data set comprises 30 laparoscopic videos and corresponding sensor data from medical devices in the operating room for three different types of laparoscopic surgery.
arXiv Detail & Related papers (2020-05-07T14:04:29Z)
Robust Medical Instrument Segmentation Challenge 2019 [56.148440125599905]
Intraoperative tracking of laparoscopic instruments is often a prerequisite for computer and robotic-assisted interventions. Our challenge was based on a surgical data set comprising 10,040 annotated images acquired from a total of 30 surgical procedures. The results confirm the initial hypothesis, namely that algorithm performance degrades with an increasing domain gap.
arXiv Detail & Related papers (2020-03-23T14:35:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.