ProstaTD: Bridging Surgical Triplet from Classification to Fully Supervised Detection
- URL: http://arxiv.org/abs/2506.01130v3
- Date: Sat, 27 Sep 2025 03:37:02 GMT
- Title: ProstaTD: Bridging Surgical Triplet from Classification to Fully Supervised Detection
- Authors: Yiliang Chen, Zhixi Li, Cheng Xu, Alex Qinyang Liu, Ruize Cui, Xuemiao Xu, Jeremy Yuen-Chun Teoh, Shengfeng He, Jing Qin,
- Abstract summary: ProstaTD is a large-scale dataset for surgical triplet detection developed from the technically demanding domain of robot-assisted prostatectomy.<n>The dataset comprises 71,775 video frames and 196,490 annotated triplet instances, collected from 21 surgeries performed across multiple institutions.<n>ProstaTD is the largest and most diverse surgical triplet dataset to date, moving the field from simple classification to full detection with precise spatial and temporal boundaries.
- Score: 54.270188252068145
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Surgical triplet detection is a critical task in surgical video analysis. However, existing datasets like CholecT50 lack precise spatial bounding box annotations, rendering triplet classification at the image level insufficient for practical applications. The inclusion of bounding box annotations is essential to make this task meaningful, as they provide the spatial context necessary for accurate analysis and improved model generalizability. To address these shortcomings, we introduce ProstaTD, a large-scale, multi-institutional dataset for surgical triplet detection, developed from the technically demanding domain of robot-assisted prostatectomy. ProstaTD offers clinically defined temporal boundaries and high-precision bounding box annotations for each structured triplet activity. The dataset comprises 71,775 video frames and 196,490 annotated triplet instances, collected from 21 surgeries performed across multiple institutions, reflecting a broad range of surgical practices and intraoperative conditions. The annotation process was conducted under rigorous medical supervision and involved more than 60 contributors, including practicing surgeons and medically trained annotators, through multiple iterative phases of labeling and verification. To further facilitate future general-purpose surgical annotation, we developed two tailored labeling tools to improve efficiency and scalability in our annotation workflows. In addition, we created a surgical triplet detection evaluation toolkit that enables standardized and reproducible performance assessment across studies. ProstaTD is the largest and most diverse surgical triplet dataset to date, moving the field from simple classification to full detection with precise spatial and temporal boundaries and thereby providing a robust foundation for fair benchmarking.
Related papers
- Grounding Surgical Action Triplets with Instrument Instance Segmentation: A Dataset and Target-Aware Fusion Approach [16.569535111037315]
We present CholecTriplet-Seg, a large-scale dataset containing over 30,000 annotated frames, linking instrument instance masks with action verb and anatomical target annotations, and establishing the first benchmark for strongly supervised, instance-level triplet grounding and evaluation.<n>We also propose TargetFusionNet, a novel architecture that extends Mask2Former with a target-aware fusion mechanism to address the challenge of accurate anatomical target prediction by fusing weak anatomy priors with instrument instance queries.
arXiv Detail & Related papers (2025-11-01T17:45:40Z) - Comparative validation of surgical phase recognition, instrument keypoint estimation, and instrument instance segmentation in endoscopy: Results of the PhaKIR 2024 challenge [27.48982385201173]
We introduce a novel dataset comprising thirteen full-length laparoscopic cholecystectomy videos collected from three medical institutions.<n>Unlike existing datasets, ours enables joint investigation of instrument localization and procedural context within the same data.<n>We report results and findings in accordance with the BIAS guidelines for biomedical image analysis challenges.
arXiv Detail & Related papers (2025-07-22T13:10:42Z) - Multi-Stage Boundary-Aware Transformer Network for Action Segmentation in Untrimmed Surgical Videos [0.1053373860696675]
Capturing and analyzing long sequences of actions in surgical settings is challenging due to the inherent variability in individual surgeon approaches.<n>This variability complicates the identification and segmentation of distinct actions with ambiguous boundary start and end points.<n>We propose the Multi-Stage Boundary-Aware Transformer Network (MSBATN) with hierarchical sliding window attention to improve action segmentation.
arXiv Detail & Related papers (2025-04-26T01:07:56Z) - Landmark-Free Preoperative-to-Intraoperative Registration in Laparoscopic Liver Resection [50.388465935739376]
Liver registration by overlaying preoperative 3D models onto intraoperative 2D frames can assist surgeons in perceiving the spatial anatomy of the liver clearly for a higher surgical success rate.<n>Existing registration methods rely heavily on anatomical landmark-based, which encounter two major limitations.<n>We propose a landmark-free preoperative-to-intraoperative registration framework utilizing effective self-supervised learning.
arXiv Detail & Related papers (2025-04-21T14:55:57Z) - A Continual Learning-driven Model for Accurate and Generalizable Segmentation of Clinically Comprehensive and Fine-grained Whole-body Anatomies in CT [67.34586036959793]
There is no fully annotated CT dataset with all anatomies delineated for training.<n>We propose a novel continual learning-driven CT model that can segment complete anatomies.<n>Our single unified CT segmentation model, CL-Net, can highly accurately segment a clinically comprehensive set of 235 fine-grained whole-body anatomies.
arXiv Detail & Related papers (2025-03-16T23:55:02Z) - Phase-Informed Tool Segmentation for Manual Small-Incision Cataract Surgery [5.346116837157231]
Cataract surgery is the most common surgical procedure globally, with a disproportionately higher burden in developing countries.<n>Existing works primarily focus on Phaco cataract surgery, an expensive technique not accessible in regions where cataract treatment is most needed.<n>Manual Small-Incision Cataract Surgery (MSICS) is the preferred low-cost, faster alternative in high-volume settings and for challenging cases.<n>We introduce Sankara-MSICS, the first comprehensive dataset containing 53 surgical videos annotated for 18 surgical phases and 3,527 frames with 13 surgical tools at the pixel level.
arXiv Detail & Related papers (2024-11-25T09:22:42Z) - Surgical Triplet Recognition via Diffusion Model [59.50938852117371]
Surgical triplet recognition is an essential building block to enable next-generation context-aware operating rooms.
We propose Difft, a new generative framework for surgical triplet recognition employing the diffusion model.
Experiments on the CholecT45 and CholecT50 datasets show the superiority of the proposed method in achieving a new state-of-the-art performance for surgical triplet recognition.
arXiv Detail & Related papers (2024-06-19T04:43:41Z) - CholecTrack20: A Multi-Perspective Tracking Dataset for Surgical Tools [1.7059333957102913]
Existing datasets rely on overly generic tracking formalizations that fail to capture surgical-specific dynamics.<n>We introduce CholecTrack20, a specialized dataset for multi-class, multi-tool tracking in surgical procedures.<n>The dataset comprises 20 full-length surgical videos, annotated at 1 fps, yielding over 35K frames and 65K labeled tool instances.
arXiv Detail & Related papers (2023-12-12T15:18:15Z) - CholecTriplet2022: Show me a tool and tell me the triplet -- an
endoscopic vision challenge for surgical action triplet detection [41.66666272822756]
This paper presents the CholecTriplet2022 challenge, which extends surgical action triplet modeling from recognition to detection.
It includes weakly-supervised bounding box localization of every visible surgical instrument (or tool) as the key actors, and the modeling of each tool-activity in the form of instrument, verb, target> triplet.
arXiv Detail & Related papers (2023-02-13T11:53:14Z) - CholecTriplet2021: A benchmark challenge for surgical action triplet
recognition [66.51610049869393]
This paper presents CholecTriplet 2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos.
We present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge.
A total of 4 baseline methods and 19 new deep learning algorithms are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%.
arXiv Detail & Related papers (2022-04-10T18:51:55Z) - A unified 3D framework for Organs at Risk Localization and Segmentation
for Radiation Therapy Planning [56.52933974838905]
Current medical workflow requires manual delineation of organs-at-risk (OAR)
In this work, we aim to introduce a unified 3D pipeline for OAR localization-segmentation.
Our proposed framework fully enables the exploitation of 3D context information inherent in medical imaging.
arXiv Detail & Related papers (2022-03-01T17:08:41Z) - Robust Medical Instrument Segmentation Challenge 2019 [56.148440125599905]
Intraoperative tracking of laparoscopic instruments is often a prerequisite for computer and robotic-assisted interventions.
Our challenge was based on a surgical data set comprising 10,040 annotated images acquired from a total of 30 surgical procedures.
The results confirm the initial hypothesis, namely that algorithm performance degrades with an increasing domain gap.
arXiv Detail & Related papers (2020-03-23T14:35:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.