CataractCompDetect: Intraoperative Complication Detection in Cataract Surgery
- URL: http://arxiv.org/abs/2511.18968v1
- Date: Mon, 24 Nov 2025 10:34:12 GMT
- Title: CataractCompDetect: Intraoperative Complication Detection in Cataract Surgery
- Authors: Bhuvan Sachdeva, Sneha Kumari, Rudransh Agarwal, Shalaka Kumaraswamy, Niharika Singri Prasad, Simon Mueller, Raphael Lechtenboehmer, Maximilian W. M. Wintergerst, Thomas Schultz, Kaushik Murali, Mohit Jain,
- Abstract summary: CataractCompDetect combines phase-aware localization, SAM 2-based tracking, complication-specific risk scoring, and vision-language reasoning for final classification.<n>On CataComp, CataractCompDetect achieves an average F1 score of 70.63%, with per-complication performance of 81.8% (Iris Prolapse), 60.87% (PCR), and 69.23% (Vitreous Loss)
- Score: 3.3884925376993347
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cataract surgery is one of the most commonly performed surgeries worldwide, yet intraoperative complications such as iris prolapse, posterior capsule rupture (PCR), and vitreous loss remain major causes of adverse outcomes. Automated detection of such events could enable early warning systems and objective training feedback. In this work, we propose CataractCompDetect, a complication detection framework that combines phase-aware localization, SAM 2-based tracking, complication-specific risk scoring, and vision-language reasoning for final classification. To validate CataractCompDetect, we curate CataComp, the first cataract surgery video dataset annotated for intraoperative complications, comprising 53 surgeries, including 23 with clinical complications. On CataComp, CataractCompDetect achieves an average F1 score of 70.63%, with per-complication performance of 81.8% (Iris Prolapse), 60.87% (PCR), and 69.23% (Vitreous Loss). These results highlight the value of combining structured surgical priors with vision-language reasoning for recognizing rare but high-impact intraoperative events. Our dataset and code will be publicly released upon acceptance.
Related papers
- End to End AI System for Surgical Gesture Sequence Recognition and Clinical Outcome Prediction [5.409483209009106]
We present Frame-to-Outcome (F2O), an end-to-end system that translates tissue dissection videos into gesture sequences.<n>F2O robustly detects consecutive short (2 seconds) gestures in the nerve-sparing step of robot-assisted radical prostatectomy.
arXiv Detail & Related papers (2025-11-14T22:02:46Z) - Improving Surgical Risk Prediction Through Integrating Automated Body Composition Analysis: a Retrospective Trial on Colectomy Surgery [3.424374887940227]
The primary outcome was the predictive performance for 1-year all-cause mortality following colectomy.<n> Secondary outcomes included postoperative complications, unplanned readmission, blood transfusion, and severe infection.
arXiv Detail & Related papers (2025-06-13T17:51:14Z) - ProstaTD: Bridging Surgical Triplet from Classification to Fully Supervised Detection [54.270188252068145]
ProstaTD is a large-scale dataset for surgical triplet detection developed from the technically demanding domain of robot-assisted prostatectomy.<n>The dataset comprises 71,775 video frames and 196,490 annotated triplet instances, collected from 21 surgeries performed across multiple institutions.<n>ProstaTD is the largest and most diverse surgical triplet dataset to date, moving the field from simple classification to full detection with precise spatial and temporal boundaries.
arXiv Detail & Related papers (2025-06-01T19:29:39Z) - Benchmarking Laparoscopic Surgical Image Restoration and Beyond [54.28852320829451]
In laparoscopic surgery, a clear and high-quality visual field is critical for surgeons to make accurate decisions.<n> persistent visual degradation, including smoke generated by energy devices, lens fogging from thermal gradients, and lens contamination pose risks to patient safety.<n>We introduce a real-world open-source surgical image restoration dataset covering laparoscopic environments, called SurgClean.
arXiv Detail & Related papers (2025-05-25T14:17:56Z) - Landmark-Free Preoperative-to-Intraoperative Registration in Laparoscopic Liver Resection [50.388465935739376]
Liver registration by overlaying preoperative 3D models onto intraoperative 2D frames can assist surgeons in perceiving the spatial anatomy of the liver clearly for a higher surgical success rate.<n>Existing registration methods rely heavily on anatomical landmark-based, which encounter two major limitations.<n>We propose a landmark-free preoperative-to-intraoperative registration framework utilizing effective self-supervised learning.
arXiv Detail & Related papers (2025-04-21T14:55:57Z) - Predicting Postoperative Intraocular Lens Dislocation in Cataract
Surgery via Deep Learning [5.40411016117853]
A critical yet unpredictable complication following cataract surgery is intraocular lens dislocation.
We develop and evaluate the first fully-automatic framework for the computation of lens unfolding delay, rotation, and instability during surgery.
We exploit a large-scale dataset of cataract surgery videos featuring four intraocular lens brands.
arXiv Detail & Related papers (2023-12-06T10:27:15Z) - CoRe: An Automated Pipeline for The Prediction of Liver Resection
Complexity from Preoperative CT Scans [53.561797148529664]
Tumors located in critical positions are known to complexify liver resections.
CoRe is an automated medical image processing pipeline for the prediction of postoperative LR complexity.
arXiv Detail & Related papers (2022-10-15T15:29:24Z) - Learning-Based Keypoint Registration for Fetoscopic Mosaicking [65.02392513942533]
In Twin-to-Twin Transfusion Syndrome (TTTS), abnormal vascular anastomoses in the monochorionic placenta can produce uneven blood flow between the two fetuses.
We propose a learning-based framework for in-vivo fetoscopy frame registration for field-of-view expansion.
arXiv Detail & Related papers (2022-07-26T21:21:12Z) - CholecTriplet2021: A benchmark challenge for surgical action triplet
recognition [66.51610049869393]
This paper presents CholecTriplet 2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos.
We present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge.
A total of 4 baseline methods and 19 new deep learning algorithms are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%.
arXiv Detail & Related papers (2022-04-10T18:51:55Z) - Automatic Detection and Segmentation of Postoperative Cerebellar Damage
Based on Normalization [1.1470070927586016]
A reliable localization and measure of cerebellar damage is fundamental to study the relationship between the damaged cerebellar regions and postoperative neurological outcomes.
Existing cerebellum normalization methods are not reliable on postoperative scans, therefore current approaches to measure surgical damage rely on manual labelling.
We develop a robust algorithm to automatically detect and measure cerebellum damage due to surgery using postoperative 3D T1 magnetic resonance imaging.
arXiv Detail & Related papers (2022-03-03T22:26:59Z) - LensID: A CNN-RNN-Based Framework Towards Lens Irregularity Detection in
Cataract Surgery Videos [6.743968799949719]
A critical complication after cataract surgery is the dislocation of the lens implant leading to vision deterioration and eye trauma.
We propose an end-to-end recurrent neural network to recognize the lens-implantation phase and a novel semantic segmentation network to segment the lens and pupil after the implantation phase.
arXiv Detail & Related papers (2021-07-02T07:27:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.