A benchmark for video-based laparoscopic skill analysis and assessment
- URL: http://arxiv.org/abs/2602.09927v1
- Date: Tue, 10 Feb 2026 15:59:19 GMT
- Title: A benchmark for video-based laparoscopic skill analysis and assessment
- Authors: Isabel Funke, Sebastian Bodenstedt, Felix von Bechtolsheim, Florian Oehme, Michael Maruschke, Stefanie Herrlich, Jürgen Weitz, Marius Distler, Sören Torge Mees, Stefanie Speidel,
- Abstract summary: We introduce the Laparoscopic Skill Analysis and Assessment dataset, comprising 1270 stereo video recordings of four basic laparoscopic training tasks.<n>Each recording is annotated with a structured skill rating, aggregated from three independent raters, as well as binary labels indicating the presence or absence of task-specific errors.<n>To facilitate benchmarking of both existing and novel approaches for video-based skill assessment and error recognition, we provide predefined data splits for each task.
- Score: 1.5734501497837607
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Laparoscopic surgery is a complex surgical technique that requires extensive training. Recent advances in deep learning have shown promise in supporting this training by enabling automatic video-based assessment of surgical skills. However, the development and evaluation of deep learning models is currently hindered by the limited size of available annotated datasets. To address this gap, we introduce the Laparoscopic Skill Analysis and Assessment (LASANA) dataset, comprising 1270 stereo video recordings of four basic laparoscopic training tasks. Each recording is annotated with a structured skill rating, aggregated from three independent raters, as well as binary labels indicating the presence or absence of task-specific errors. The majority of recordings originate from a laparoscopic training course, thereby reflecting a natural variation in the skill of participants. To facilitate benchmarking of both existing and novel approaches for video-based skill assessment and error recognition, we provide predefined data splits for each task. Furthermore, we present baseline results from a deep learning model as a reference point for future comparisons.
Related papers
- Clinical-Prior Guided Multi-Modal Learning with Latent Attention Pooling for Gait-Based Scoliosis Screening [8.010714901985898]
Adolescent Idiopathic Scoliosis (AIS) is a prevalent spinal deformity whose progression can be mitigated through early detection.<n>Current screening methods are subjective, difficult to scale, and reliant on specialized clinical expertise.<n>Video-based gait analysis offers a promising alternative, but current datasets and methods frequently suffer from data leakage.<n>ScoliGait is a new benchmark dataset comprising 1,572 gait video clips for training and 300 fully independent clips for testing.
arXiv Detail & Related papers (2026-02-06T14:44:22Z) - Cataract-LMM: Large-Scale, Multi-Source, Multi-Task Benchmark for Deep Learning in Surgical Video Analysis [4.318540086708654]
We present a dataset of 3,000 cataract surgery videos from two surgical centers, performed by surgeons with a range of experience levels.<n>This resource is enriched with four annotation layers: temporal surgical phases, instance segmentation of instruments and anatomical structures, instrument-tissue interaction tracking, and quantitative skill scores.<n>The technical quality of the dataset is supported by a series of benchmarking experiments for key surgical AI tasks.
arXiv Detail & Related papers (2025-10-18T06:48:29Z) - SurgXBench: Explainable Vision-Language Model Benchmark for Surgery [4.068223793121694]
Vision-Language Models (VLMs) have brought transformative advances in reasoning across visual and textual modalities.<n>Existing models show limited performance, highlighting the need for benchmark studies to assess their capabilities and limitations.<n>We benchmark the zero-shot performance of several advancedVLMs on two public robotic-assisted laparoscopic datasets for instrument and action classification.
arXiv Detail & Related papers (2025-05-16T00:42:18Z) - Watch and Learn: Leveraging Expert Knowledge and Language for Surgical Video Understanding [1.024113475677323]
The lack of datasets hinders the development of accurate and comprehensive workflow analysis solutions.<n>We introduce a novel approach for addressing the sparsity and heterogeneity of data inspired by the human learning procedure of watching experts and understanding their explanations.<n>We present the first comprehensive solution for dense video captioning (DVC) of surgical videos, addressing this task despite the absence of existing datasets in the surgical domain.
arXiv Detail & Related papers (2025-03-14T13:36:13Z) - Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning [65.54680361074882]
Eye-gaze Guided Multi-modal Alignment (EGMA) framework harnesses eye-gaze data for better alignment of medical visual and textual features.
We conduct downstream tasks of image classification and image-text retrieval on four medical datasets.
arXiv Detail & Related papers (2024-03-19T03:59:14Z) - RIDE: Self-Supervised Learning of Rotation-Equivariant Keypoint
Detection and Invariant Description for Endoscopy [83.4885991036141]
RIDE is a learning-based method for rotation-equivariant detection and invariant description.
It is trained in a self-supervised manner on a large curation of endoscopic images.
It sets a new state-of-the-art performance on matching and relative pose estimation tasks.
arXiv Detail & Related papers (2023-09-18T08:16:30Z) - A Survey of the Impact of Self-Supervised Pretraining for Diagnostic
Tasks with Radiological Images [71.26717896083433]
Self-supervised pretraining has been observed to be effective at improving feature representations for transfer learning.
This review summarizes recent research into its usage in X-ray, computed tomography, magnetic resonance, and ultrasound imaging.
arXiv Detail & Related papers (2023-09-05T19:45:09Z) - One-shot skill assessment in high-stakes domains with limited data via meta learning [0.0]
A-VBANet is a novel meta-learning model capable of delivering domain-agnostic skill assessment via one-shot learning.
Our model successfully adapted with accuracies up to 99.5% in one-shot and 99.9% in few-shot settings for simulated tasks and 89.7% for laparoscopic cholecystectomy.
arXiv Detail & Related papers (2022-12-16T01:04:52Z) - AutoLaparo: A New Dataset of Integrated Multi-tasks for Image-guided
Surgical Automation in Laparoscopic Hysterectomy [42.20922574566824]
We present and release the first integrated dataset with multiple image-based perception tasks to facilitate learning-based automation in hysterectomy surgery.
Our AutoLaparo dataset is developed based on full-length videos of entire hysterectomy procedures.
Specifically, three different yet highly correlated tasks are formulated in the dataset, including surgical workflow recognition, laparoscope motion prediction, and instrument and key anatomy segmentation.
arXiv Detail & Related papers (2022-08-03T13:17:23Z) - LifeLonger: A Benchmark for Continual Disease Classification [59.13735398630546]
We introduce LifeLonger, a benchmark for continual disease classification on the MedMNIST collection.
Task and class incremental learning of diseases address the issue of classifying new samples without re-training the models from scratch.
Cross-domain incremental learning addresses the issue of dealing with datasets originating from different institutions while retaining the previously obtained knowledge.
arXiv Detail & Related papers (2022-04-12T12:25:05Z) - Federated Cycling (FedCy): Semi-supervised Federated Learning of
Surgical Phases [57.90226879210227]
FedCy is a semi-supervised learning (FSSL) method that combines FL and self-supervised learning to exploit a decentralized dataset of both labeled and unlabeled videos.
We demonstrate significant performance gains over state-of-the-art FSSL methods on the task of automatic recognition of surgical phases.
arXiv Detail & Related papers (2022-03-14T17:44:53Z) - SSLM: Self-Supervised Learning for Medical Diagnosis from MR Video [19.5917119072985]
In this paper, we propose a self-supervised learning approach to learn the spatial anatomical representations from magnetic resonance (MR) video clips.
The proposed pretext model learns meaningful spatial context-invariant representations.
Different experiments show that the features learnt by the pretext model provide explainable performance in the downstream task.
arXiv Detail & Related papers (2021-04-21T12:01:49Z) - LRTD: Long-Range Temporal Dependency based Active Learning for Surgical
Workflow Recognition [67.86810761677403]
We propose a novel active learning method for cost-effective surgical video analysis.
Specifically, we propose a non-local recurrent convolutional network (NL-RCNet), which introduces non-local block to capture the long-range temporal dependency.
We validate our approach on a large surgical video dataset (Cholec80) by performing surgical workflow recognition task.
arXiv Detail & Related papers (2020-04-21T09:21:22Z) - Confident Coreset for Active Learning in Medical Image Analysis [57.436224561482966]
We propose a novel active learning method, confident coreset, which considers both uncertainty and distribution for effectively selecting informative samples.
By comparative experiments on two medical image analysis tasks, we show that our method outperforms other active learning methods.
arXiv Detail & Related papers (2020-04-05T13:46:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.