ROBUST-MIPS: A Combined Skeletal Pose and Instance Segmentation Dataset for Laparoscopic Surgical Instruments
- URL: http://arxiv.org/abs/2508.21096v1
- Date: Wed, 27 Aug 2025 22:23:47 GMT
- Title: ROBUST-MIPS: A Combined Skeletal Pose and Instance Segmentation Dataset for Laparoscopic Surgical Instruments
- Authors: Zhe Han, Charlie Budd, Gongyu Zhang, Huanyu Tian, Christos Bergeles, Tom Vercauteren,
- Abstract summary: Localisation of surgical tools is a building block for computer-assisted interventional technologies.<n>We argue that skeletal pose annotations are a more efficient annotation approach for surgical tools.<n>We present ROBUST-MIPS, a combined tool pose and tool instance segmentation dataset.
- Score: 8.024055417738227
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Localisation of surgical tools constitutes a foundational building block for computer-assisted interventional technologies. Works in this field typically focus on training deep learning models to perform segmentation tasks. Performance of learning-based approaches is limited by the availability of diverse annotated data. We argue that skeletal pose annotations are a more efficient annotation approach for surgical tools, striking a balance between richness of semantic information and ease of annotation, thus allowing for accelerated growth of available annotated data. To encourage adoption of this annotation style, we present, ROBUST-MIPS, a combined tool pose and tool instance segmentation dataset derived from the existing ROBUST-MIS dataset. Our enriched dataset facilitates the joint study of these two annotation styles and allow head-to-head comparison on various downstream tasks. To demonstrate the adequacy of pose annotations for surgical tool localisation, we set up a simple benchmark using popular pose estimation methods and observe high-quality results. To ease adoption, together with the dataset, we release our benchmark models and custom tool pose annotation software.
Related papers
- In search of truth: Evaluating concordance of AI-based anatomy segmentation models [3.740726797046942]
AI-based methods for anatomy segmentation can help automate characterization of large imaging datasets.<n>We introduce a practical framework to assist in evaluating them on datasets that do not contain ground truth annotations.
arXiv Detail & Related papers (2025-12-17T19:33:56Z) - Foundation X: Integrating Classification, Localization, and Segmentation through Lock-Release Pretraining Strategy for Chest X-ray Analysis [3.874753046352665]
nFoundation X is an end-to-end framework that utilizes diverse expert-level annotations to train a foundation model.<n>We trained a model using 11 chest X-ray datasets, covering annotations for classification, localization, and segmentation tasks.<n>Our experimental results show that Foundation X achieves notable performance gains through extensive annotation utilization.
arXiv Detail & Related papers (2025-03-12T21:45:13Z) - Learning to Extract Structured Entities Using Language Models [52.281701191329]
Recent advances in machine learning have significantly impacted the field of information extraction.
We reformulate the task to be entity-centric, enabling the use of diverse metrics.
We contribute to the field by introducing Structured Entity Extraction and proposing the Approximate Entity Set OverlaP metric.
arXiv Detail & Related papers (2024-02-06T22:15:09Z) - Surgical Phase and Instrument Recognition: How to identify appropriate
Dataset Splits [2.045596350476764]
This work presents a publicly available data visualization tool that enables interactive exploration of dataset splits.
It focuses on the visualization of the occurrence of phases, phase transitions, instruments, and instrument combinations across sets.
Results: We performed an analysis of common Cholec80 dataset splits and were able to uncover phase transitions and combinations of instruments that were not represented in one of the sets.
arXiv Detail & Related papers (2023-06-29T12:02:16Z) - ALP: Action-Aware Embodied Learning for Perception [60.64801970249279]
We introduce Action-Aware Embodied Learning for Perception (ALP)
ALP incorporates action information into representation learning through a combination of optimizing a reinforcement learning policy and an inverse dynamics prediction objective.
We show that ALP outperforms existing baselines in several downstream perception tasks.
arXiv Detail & Related papers (2023-06-16T21:51:04Z) - Learnable Weight Initialization for Volumetric Medical Image Segmentation [66.3030435676252]
We propose a learnable weight-based hybrid medical image segmentation approach.
Our approach is easy to integrate into any hybrid model and requires no external training data.
Experiments on multi-organ and lung cancer segmentation tasks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-15T17:55:05Z) - Learning from Temporal Spatial Cubism for Cross-Dataset Skeleton-based
Action Recognition [88.34182299496074]
Action labels are only available on a source dataset, but unavailable on a target dataset in the training stage.
We utilize a self-supervision scheme to reduce the domain shift between two skeleton-based action datasets.
By segmenting and permuting temporal segments or human body parts, we design two self-supervised learning classification tasks.
arXiv Detail & Related papers (2022-07-17T07:05:39Z) - Self-Attention Neural Bag-of-Features [103.70855797025689]
We build on the recently introduced 2D-Attention and reformulate the attention learning methodology.
We propose a joint feature-temporal attention mechanism that learns a joint 2D attention mask highlighting relevant information.
arXiv Detail & Related papers (2022-01-26T17:54:14Z) - Clustering augmented Self-Supervised Learning: Anapplication to Land
Cover Mapping [10.720852987343896]
We introduce a new method for land cover mapping by using a clustering based pretext task for self-supervised learning.
We demonstrate the effectiveness of the method on two societally relevant applications.
arXiv Detail & Related papers (2021-08-16T19:35:43Z) - Simulation-to-Real domain adaptation with teacher-student learning for
endoscopic instrument segmentation [1.1047993346634768]
We introduce a teacher-student learning approach that learns jointly from annotated simulation data and unlabeled real data.
Empirical results on three datasets highlight the effectiveness of the proposed framework.
arXiv Detail & Related papers (2021-03-02T09:30:28Z) - ISINet: An Instance-Based Approach for Surgical Instrument Segmentation [0.0]
We study the task of semantic segmentation of surgical instruments in robotic-assisted surgery scenes.
We propose ISINet, a method that addresses this task from an instance-based segmentation perspective.
Our results show that ISINet significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2020-07-10T16:20:56Z) - A Trainable Optimal Transport Embedding for Feature Aggregation and its
Relationship to Attention [96.77554122595578]
We introduce a parametrized representation of fixed size, which embeds and then aggregates elements from a given input set according to the optimal transport plan between the set and a trainable reference.
Our approach scales to large datasets and allows end-to-end training of the reference, while also providing a simple unsupervised learning mechanism with small computational cost.
arXiv Detail & Related papers (2020-06-22T08:35:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.