Deep learning approaches to surgical video segmentation and object detection: A Scoping Review
- URL: http://arxiv.org/abs/2502.16459v1
- Date: Sun, 23 Feb 2025 06:31:23 GMT
- Title: Deep learning approaches to surgical video segmentation and object detection: A Scoping Review
- Authors: Devanish N. Kamtam, Joseph B. Shrager, Satya Deepya Malla, Nicole Lin, Juan J. Cardona, Jake J. Kim, Clarence Hu,
- Abstract summary: We conducted a scoping review of studies on semantic segmentation and object detection of anatomical structures published between 2014 and 2024.<n>The primary objective was to evaluate the state-of-the-art performance of semantic segmentation in surgical videos.<n>The secondary objectives included examining DL models, progress toward clinical applications, and the specific challenges with segmentation of organs/tissues in surgical videos.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Introduction: Computer vision (CV) has had a transformative impact in biomedical fields such as radiology, dermatology, and pathology. Its real-world adoption in surgical applications, however, remains limited. We review the current state-of-the-art performance of deep learning (DL)-based CV models for segmentation and object detection of anatomical structures in videos obtained during surgical procedures. Methods: We conducted a scoping review of studies on semantic segmentation and object detection of anatomical structures published between 2014 and 2024 from 3 major databases - PubMed, Embase, and IEEE Xplore. The primary objective was to evaluate the state-of-the-art performance of semantic segmentation in surgical videos. Secondary objectives included examining DL models, progress toward clinical applications, and the specific challenges with segmentation of organs/tissues in surgical videos. Results: We identified 58 relevant published studies. These focused predominantly on procedures from general surgery [20(34.4%)], colorectal surgery [9(15.5%)], and neurosurgery [8(13.8%)]. Cholecystectomy [14(24.1%)] and low anterior rectal resection [5(8.6%)] were the most common procedures addressed. Semantic segmentation [47(81%)] was the primary CV task. U-Net [14(24.1%)] and DeepLab [13(22.4%)] were the most widely used models. Larger organs such as the liver (Dice score: 0.88) had higher accuracy compared to smaller structures such as nerves (Dice score: 0.49). Models demonstrated real-time inference potential ranging from 5-298 frames-per-second (fps). Conclusion: This review highlights the significant progress made in DL-based semantic segmentation for surgical videos with real-time applicability, particularly for larger organs. Addressing challenges with smaller structures, data availability, and generalizability remains crucial for future advancements.
Related papers
- Surgeons vs. Computer Vision: A comparative analysis on surgical phase recognition capabilities [65.66373425605278]
Automated Surgical Phase Recognition (SPR) uses Artificial Intelligence (AI) to segment the surgical workflow into its key events.
Previous research has focused on short and linear surgical procedures and has not explored if temporal context influences experts' ability to better classify surgical phases.
This research addresses these gaps, focusing on Robot-Assisted Partial Nephrectomy (RAPN) as a highly non-linear procedure.
arXiv Detail & Related papers (2025-04-26T15:37:22Z) - A Continual Learning-driven Model for Accurate and Generalizable Segmentation of Clinically Comprehensive and Fine-grained Whole-body Anatomies in CT [67.34586036959793]
There is no fully annotated CT dataset with all anatomies delineated for training.
We propose a novel continual learning-driven CT model that can segment complete anatomies.
Our single unified CT segmentation model, CL-Net, can highly accurately segment a clinically comprehensive set of 235 fine-grained whole-body anatomies.
arXiv Detail & Related papers (2025-03-16T23:55:02Z) - Is Segment Anything Model 2 All You Need for Surgery Video Segmentation? A Systematic Evaluation [25.459372606957736]
In this paper, we systematically evaluate the performance of SAM2 model in zero-shot surgery video segmentation task.<n>We conducted experiments under different configurations, including different prompting strategies, robustness, etc.
arXiv Detail & Related papers (2024-12-31T16:20:05Z) - Phase-Informed Tool Segmentation for Manual Small-Incision Cataract Surgery [5.346116837157231]
Cataract surgery is the most common surgical procedure globally, with a disproportionately higher burden in developing countries.<n>Existing works primarily focus on Phaco cataract surgery, an expensive technique not accessible in regions where cataract treatment is most needed.<n>Manual Small-Incision Cataract Surgery (MSICS) is the preferred low-cost, faster alternative in high-volume settings and for challenging cases.<n>We introduce Sankara-MSICS, the first comprehensive dataset containing 53 surgical videos annotated for 18 surgical phases and 3,527 frames with 13 surgical tools at the pixel level.
arXiv Detail & Related papers (2024-11-25T09:22:42Z) - A novel open-source ultrasound dataset with deep learning benchmarks for
spinal cord injury localization and anatomical segmentation [1.02101998415327]
We present an ultrasound dataset of 10,223-mode (B-mode) images consisting of sagittal slices of porcine spinal cords.
We benchmark the performance metrics of several state-of-the-art object detection algorithms to localize the site of injury.
We evaluate the zero-shot generalization capabilities of the segmentation models on human ultrasound spinal cord images.
arXiv Detail & Related papers (2024-09-24T20:22:59Z) - SURGIVID: Annotation-Efficient Surgical Video Object Discovery [42.16556256395392]
We propose an annotation-efficient framework for the semantic segmentation of surgical scenes.
We employ image-based self-supervised object discovery to identify the most salient tools and anatomical structures in surgical videos.
Our unsupervised setup reinforced with only 36 annotation labels indicates comparable localization performance with fully-supervised segmentation models.
arXiv Detail & Related papers (2024-09-12T07:12:20Z) - TotalSegmentator MRI: Sequence-Independent Segmentation of 59 Anatomical Structures in MR images [62.53931644063323]
In this study we extended the capabilities of TotalSegmentator to MR images.
We trained an nnU-Net segmentation algorithm on this dataset and calculated similarity coefficients (Dice) to evaluate the model's performance.
The model significantly outperformed two other publicly available segmentation models (Dice score 0.824 versus 0.762; p0.001 and 0.762 versus 0.542; p)
arXiv Detail & Related papers (2024-05-29T20:15:54Z) - Semantic segmentation of surgical hyperspectral images under geometric
domain shifts [69.91792194237212]
We present the first analysis of state-of-the-art semantic segmentation networks in the presence of geometric out-of-distribution (OOD) data.
We also address generalizability with a dedicated augmentation technique termed "Organ Transplantation"
Our scheme improves on the SOA DSC by up to 67 % (RGB) and 90 % (HSI) and renders performance on par with in-distribution performance on real OOD test data.
arXiv Detail & Related papers (2023-03-20T09:50:07Z) - TotalSegmentator: robust segmentation of 104 anatomical structures in CT
images [48.50994220135258]
We present a deep learning segmentation model for body CT images.
The model can segment 104 anatomical structures relevant for use cases such as organ volumetry, disease characterization, and surgical or radiotherapy planning.
arXiv Detail & Related papers (2022-08-11T15:16:40Z) - Dissecting Self-Supervised Learning Methods for Surgical Computer Vision [51.370873913181605]
Self-Supervised Learning (SSL) methods have begun to gain traction in the general computer vision community.
The effectiveness of SSL methods in more complex and impactful domains, such as medicine and surgery, remains limited and unexplored.
We present an extensive analysis of the performance of these methods on the Cholec80 dataset for two fundamental and popular tasks in surgical context understanding, phase recognition and tool presence detection.
arXiv Detail & Related papers (2022-07-01T14:17:11Z) - Quantification of Robotic Surgeries with Vision-Based Deep Learning [45.165919577877695]
We propose a unified deep learning framework, entitled Roboformer, which operates exclusively on videos recorded during surgery.
We validated our framework on four video-based datasets of two commonly-encountered types of steps within minimally-invasive robotic surgeries.
arXiv Detail & Related papers (2022-05-06T06:08:35Z) - CholecTriplet2021: A benchmark challenge for surgical action triplet
recognition [66.51610049869393]
This paper presents CholecTriplet 2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos.
We present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge.
A total of 4 baseline methods and 19 new deep learning algorithms are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%.
arXiv Detail & Related papers (2022-04-10T18:51:55Z) - A Global Benchmark of Algorithms for Segmenting Late Gadolinium-Enhanced
Cardiac Magnetic Resonance Imaging [90.29017019187282]
" 2018 Left Atrium Challenge" using 154 3D LGE-MRIs, currently the world's largest cardiac LGE-MRI dataset.
Analyse of the submitted algorithms using technical and biological metrics was performed.
Results show the top method achieved a dice score of 93.2% and a mean surface to a surface distance of 0.7 mm.
arXiv Detail & Related papers (2020-04-26T08:49:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.