Related papers: Surgment: Segmentation-enabled Semantic Search and Creation of Visual Question and Feedback to Support Video-Based Surgery Learning

Surgment: Segmentation-enabled Semantic Search and Creation of Visual Question and Feedback to Support Video-Based Surgery Learning

URL: http://arxiv.org/abs/2402.17903v1
Date: Tue, 27 Feb 2024 21:42:23 GMT
Title: Surgment: Segmentation-enabled Semantic Search and Creation of Visual Question and Feedback to Support Video-Based Surgery Learning
Authors: Jingying Wang, Haoran Tang, Taylor Kantor, Tandis Soltani, Vitaliy Popov and Xu Wang
Abstract summary: Surgment is a system that helps expert surgeons create exercises with feedback based on surgery recordings. The segmentation pipeline enables functionalities to create visual questions and feedback desired by surgeons. In an evaluation study with 11 surgeons, participants applauded the search-by-sketch approach for identifying frames of interest and found the resulting image-based questions and feedback to be of high educational value.
Score: 4.509082876666929
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Videos are prominent learning materials to prepare surgical trainees before they enter the operating room (OR). In this work, we explore techniques to enrich the video-based surgery learning experience. We propose Surgment, a system that helps expert surgeons create exercises with feedback based on surgery recordings. Surgment is powered by a few-shot-learning-based pipeline (SegGPT+SAM) to segment surgery scenes, achieving an accuracy of 92\%. The segmentation pipeline enables functionalities to create visual questions and feedback desired by surgeons from a formative study. Surgment enables surgeons to 1) retrieve frames of interest through sketches, and 2) design exercises that target specific anatomical components and offer visual feedback. In an evaluation study with 11 surgeons, participants applauded the search-by-sketch approach for identifying frames of interest and found the resulting image-based questions and feedback to be of high educational value.

Related papers

Anatomy Might Be All You Need: Forecasting What to Do During Surgery [41.91807060434709]
There has been growing interest in providing live guidance by analyzing video feeds from tools such as endoscopes. This work aims to provide guidance on a finer scale, aiming to provide guidance by forecasting the trajectory of the surgical instrument.
arXiv Detail & Related papers (2025-01-29T21:54:31Z)
EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery [52.992415247012296]
We introduce EndoChat to address various dialogue paradigms and subtasks in surgical scene understanding. Our model achieves state-of-the-art performance across five dialogue paradigms and eight surgical scene understanding tasks.
arXiv Detail & Related papers (2025-01-20T09:12:06Z)
Automating Feedback Analysis in Surgical Training: Detection, Categorization, and Assessment [65.70317151363204]
This work introduces the first framework for reconstructing surgical dialogue from unstructured real-world recordings. In surgical training, the formative verbal feedback that trainers provide to trainees during live surgeries is crucial for ensuring safety, correcting behavior immediately, and facilitating long-term skill acquisition. Our framework integrates voice activity detection, speaker diarization, and automated speech recaognition, with a novel enhancement that removes hallucinations.
arXiv Detail & Related papers (2024-12-01T10:35:12Z)
OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining [55.15365161143354]
OphCLIP is a hierarchical retrieval-augmented vision-language pretraining framework for ophthalmic surgical workflow understanding. OphCLIP learns both fine-grained and long-term visual representations by aligning short video clips with detailed narrative descriptions and full videos with structured titles. Our OphCLIP also designs a retrieval-augmented pretraining framework to leverage the underexplored large-scale silent surgical procedure videos.
arXiv Detail & Related papers (2024-11-23T02:53:08Z)
Procedure-Aware Surgical Video-language Pretraining with Hierarchical Knowledge Augmentation [51.222684687924215]
Surgical video-language pretraining faces unique challenges due to the knowledge domain gap and the scarcity of multi-modal data. We propose a hierarchical knowledge augmentation approach and a novel Procedure-Encoded Surgical Knowledge-Augmented Video-Language Pretraining framework to tackle these issues.
arXiv Detail & Related papers (2024-09-30T22:21:05Z)
Thoracic Surgery Video Analysis for Surgical Phase Recognition [0.08706730566331035]
We analyse and evaluate both frame-based and video clipping-based phase recognition on thoracic surgery dataset consisting of 11 classes of phases. We show that Masked Video Distillation(MVD) exhibits superior performance, achieving a top-1 accuracy of 72.9%, compared to 52.31% achieved by ImageNet ViT.
arXiv Detail & Related papers (2024-06-13T14:47:57Z)
Deep Multimodal Fusion for Surgical Feedback Classification [70.53297887843802]
We leverage a clinically-validated five-category classification of surgical feedback. We then develop a multi-label machine learning model to classify these five categories of surgical feedback from inputs of text, audio, and video modalities. The ultimate goal of our work is to help automate the annotation of real-time contextual surgical feedback at scale.
arXiv Detail & Related papers (2023-12-06T01:59:47Z)
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures [51.78027546947034]
Recent advancements in surgical computer vision have been driven by vision-only models, which lack language semantics. We propose leveraging surgical video lectures from e-learning platforms to provide effective vision and language supervisory signals. We address surgery-specific linguistic challenges using multiple automatic speech recognition systems for text transcriptions.
arXiv Detail & Related papers (2023-07-27T22:38:12Z)
Using Hand Pose Estimation To Automate Open Surgery Training Feedback [0.0]
This research aims to facilitate the use of state-of-the-art computer vision algorithms for the automated training of surgeons. By estimating 2D hand poses, we model the movement of the practitioner's hands, and their interaction with surgical instruments.
arXiv Detail & Related papers (2022-11-13T21:47:31Z)
Quantification of Robotic Surgeries with Vision-Based Deep Learning [45.165919577877695]
We propose a unified deep learning framework, entitled Roboformer, which operates exclusively on videos recorded during surgery. We validated our framework on four video-based datasets of two commonly-encountered types of steps within minimally-invasive robotic surgeries.
arXiv Detail & Related papers (2022-05-06T06:08:35Z)
CholecTriplet2021: A benchmark challenge for surgical action triplet recognition [66.51610049869393]
This paper presents CholecTriplet 2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos. We present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge. A total of 4 baseline methods and 19 new deep learning algorithms are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%.
arXiv Detail & Related papers (2022-04-10T18:51:55Z)
ESAD: Endoscopic Surgeon Action Detection Dataset [10.531648619593572]
We aim to make surgical assistant robots safer by making them aware about the actions of surgeon, so it can take appropriate assisting actions. We introduce a challenging dataset for surgeon action detection in real-world endoscopic videos.
arXiv Detail & Related papers (2020-06-12T13:22:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.