ProstaTD: A Large-scale Multi-source Dataset for Structured Surgical Triplet Detection
- URL: http://arxiv.org/abs/2506.01130v1
- Date: Sun, 01 Jun 2025 19:29:39 GMT
- Title: ProstaTD: A Large-scale Multi-source Dataset for Structured Surgical Triplet Detection
- Authors: Yiliang Chen, Zhixi Li, Cheng Xu, Alex Qinyang Liu, Xuemiao Xu, Jeremy Yuen-Chun Teoh, Shengfeng He, Jing Qin,
- Abstract summary: ProstaTD is a large-scale, multi-institutional dataset for surgical triplet detection.<n>It was developed from the technically demanding domain of robot-assisted prostatectomy.<n>The dataset comprises 60,529 video frames and 165,567 annotated triplet instances.
- Score: 34.96818119277855
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Surgical triplet detection has emerged as a pivotal task in surgical video analysis, with significant implications for performance assessment and the training of novice surgeons. However, existing datasets such as CholecT50 exhibit critical limitations: they lack precise spatial bounding box annotations, provide inconsistent and clinically ungrounded temporal labels, and rely on a single data source, which limits model generalizability.To address these shortcomings, we introduce ProstaTD, a large-scale, multi-institutional dataset for surgical triplet detection, developed from the technically demanding domain of robot-assisted prostatectomy. ProstaTD offers clinically defined temporal boundaries and high-precision bounding box annotations for each structured triplet action. The dataset comprises 60,529 video frames and 165,567 annotated triplet instances, collected from 21 surgeries performed across multiple institutions, reflecting a broad range of surgical practices and intraoperative conditions. The annotation process was conducted under rigorous medical supervision and involved more than 50 contributors, including practicing surgeons and medically trained annotators, through multiple iterative phases of labeling and verification. ProstaTD is the largest and most diverse surgical triplet dataset to date, providing a robust foundation for fair benchmarking, the development of reliable surgical AI systems, and scalable tools for procedural training.
Related papers
- Comparative validation of surgical phase recognition, instrument keypoint estimation, and instrument instance segmentation in endoscopy: Results of the PhaKIR 2024 challenge [27.48982385201173]
We introduce a novel dataset comprising thirteen full-length laparoscopic cholecystectomy videos collected from three medical institutions.<n>Unlike existing datasets, ours enables joint investigation of instrument localization and procedural context within the same data.<n>We report results and findings in accordance with the BIAS guidelines for biomedical image analysis challenges.
arXiv Detail & Related papers (2025-07-22T13:10:42Z) - Multi-Stage Boundary-Aware Transformer Network for Action Segmentation in Untrimmed Surgical Videos [0.1053373860696675]
Capturing and analyzing long sequences of actions in surgical settings is challenging due to the inherent variability in individual surgeon approaches.<n>This variability complicates the identification and segmentation of distinct actions with ambiguous boundary start and end points.<n>We propose the Multi-Stage Boundary-Aware Transformer Network (MSBATN) with hierarchical sliding window attention to improve action segmentation.
arXiv Detail & Related papers (2025-04-26T01:07:56Z) - A Continual Learning-driven Model for Accurate and Generalizable Segmentation of Clinically Comprehensive and Fine-grained Whole-body Anatomies in CT [67.34586036959793]
There is no fully annotated CT dataset with all anatomies delineated for training.<n>We propose a novel continual learning-driven CT model that can segment complete anatomies.<n>Our single unified CT segmentation model, CL-Net, can highly accurately segment a clinically comprehensive set of 235 fine-grained whole-body anatomies.
arXiv Detail & Related papers (2025-03-16T23:55:02Z) - Phase-Informed Tool Segmentation for Manual Small-Incision Cataract Surgery [5.346116837157231]
Cataract surgery is the most common surgical procedure globally, with a disproportionately higher burden in developing countries.<n>Existing works primarily focus on Phaco cataract surgery, an expensive technique not accessible in regions where cataract treatment is most needed.<n>Manual Small-Incision Cataract Surgery (MSICS) is the preferred low-cost, faster alternative in high-volume settings and for challenging cases.<n>We introduce Sankara-MSICS, the first comprehensive dataset containing 53 surgical videos annotated for 18 surgical phases and 3,527 frames with 13 surgical tools at the pixel level.
arXiv Detail & Related papers (2024-11-25T09:22:42Z) - Surgical Triplet Recognition via Diffusion Model [59.50938852117371]
Surgical triplet recognition is an essential building block to enable next-generation context-aware operating rooms.
We propose Difft, a new generative framework for surgical triplet recognition employing the diffusion model.
Experiments on the CholecT45 and CholecT50 datasets show the superiority of the proposed method in achieving a new state-of-the-art performance for surgical triplet recognition.
arXiv Detail & Related papers (2024-06-19T04:43:41Z) - CholecTrack20: A Multi-Perspective Tracking Dataset for Surgical Tools [1.7059333957102913]
Existing datasets rely on overly generic tracking formalizations that fail to capture surgical-specific dynamics.<n>We introduce CholecTrack20, a specialized dataset for multi-class, multi-tool tracking in surgical procedures.<n>The dataset comprises 20 full-length surgical videos, annotated at 1 fps, yielding over 35K frames and 65K labeled tool instances.
arXiv Detail & Related papers (2023-12-12T15:18:15Z) - CholecTriplet2021: A benchmark challenge for surgical action triplet
recognition [66.51610049869393]
This paper presents CholecTriplet 2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos.
We present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge.
A total of 4 baseline methods and 19 new deep learning algorithms are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%.
arXiv Detail & Related papers (2022-04-10T18:51:55Z) - A unified 3D framework for Organs at Risk Localization and Segmentation
for Radiation Therapy Planning [56.52933974838905]
Current medical workflow requires manual delineation of organs-at-risk (OAR)
In this work, we aim to introduce a unified 3D pipeline for OAR localization-segmentation.
Our proposed framework fully enables the exploitation of 3D context information inherent in medical imaging.
arXiv Detail & Related papers (2022-03-01T17:08:41Z) - Robust Medical Instrument Segmentation Challenge 2019 [56.148440125599905]
Intraoperative tracking of laparoscopic instruments is often a prerequisite for computer and robotic-assisted interventions.
Our challenge was based on a surgical data set comprising 10,040 annotated images acquired from a total of 30 surgical procedures.
The results confirm the initial hypothesis, namely that algorithm performance degrades with an increasing domain gap.
arXiv Detail & Related papers (2020-03-23T14:35:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.