Effective semantic segmentation in Cataract Surgery: What matters most?
- URL: http://arxiv.org/abs/2108.06119v1
- Date: Fri, 13 Aug 2021 08:27:54 GMT
- Title: Effective semantic segmentation in Cataract Surgery: What matters most?
- Authors: Theodoros Pissas, Claudio Ravasio, Lyndon Da Cruz, Christos Bergeles
- Abstract summary: Our work proposes neural network design choices that set the state-of-the-art on a challenging public benchmark on cataract surgery, CaDIS.
Our methodology achieves strong performance across three semantic segmentation tasks with increasingly granular surgical tool class sets.
- Score: 5.1151054398496685
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Our work proposes neural network design choices that set the state-of-the-art
on a challenging public benchmark on cataract surgery, CaDIS. Our methodology
achieves strong performance across three semantic segmentation tasks with
increasingly granular surgical tool class sets by effectively handling class
imbalance, an inherent challenge in any surgical video. We consider and
evaluate two conceptually simple data oversampling methods as well as different
loss functions. We show significant performance gains across network
architectures and tasks especially on the rarest tool classes, thereby
presenting an approach for achieving high performance when imbalanced granular
datasets are considered. Our code and trained models are available at
https://github.com/RViMLab/MICCAI2021_Cataract_semantic_segmentation and
qualitative results on unseen surgical video can be found at
https://youtu.be/twVIPUj1WZM.
Related papers
- Revisiting Surgical Instrument Segmentation Without Human Intervention: A Graph Partitioning View [7.594796294925481]
We propose an unsupervised method by reframing the video frame segmentation as a graph partitioning problem.
A self-supervised pre-trained model is firstly leveraged as a feature extractor to capture high-level semantic features.
On the "deep" eigenvectors, a surgical video frame is meaningfully segmented into different modules like tools and tissues, providing distinguishable semantic information.
arXiv Detail & Related papers (2024-08-27T05:31:30Z) - Hierarchical Semi-Supervised Learning Framework for Surgical Gesture
Segmentation and Recognition Based on Multi-Modality Data [2.8770761243361593]
We develop a hierarchical semi-supervised learning framework for surgical gesture segmentation using multi-modality data.
A Transformer-based network with a pre-trained ResNet-18' backbone is used to extract visual features from the surgical operation videos.
The proposed approach has been evaluated using data from the publicly available JIGS database, including Suturing, Needle Passing, and Knot Tying tasks.
arXiv Detail & Related papers (2023-07-31T21:17:59Z) - SurgMAE: Masked Autoencoders for Long Surgical Video Analysis [4.866110274299399]
Masked autoencoders (MAE) got the attention in self-supervised paradigm for Vision Transformers (ViTs)
In this paper, we first investigate whether MAE can learn transferrable representations in surgical video domain.
We propose SurgMAE, which is a novel architecture with a masking strategy on sampling high-temporal tokens for MAE.
arXiv Detail & Related papers (2023-05-19T06:12:50Z) - Learning Context-aware Classifier for Semantic Segmentation [88.88198210948426]
In this paper, contextual hints are exploited via learning a context-aware classifier.
Our method is model-agnostic and can be easily applied to generic segmentation models.
With only negligible additional parameters and +2% inference time, decent performance gain has been achieved on both small and large models.
arXiv Detail & Related papers (2023-03-21T07:00:35Z) - Dissecting Self-Supervised Learning Methods for Surgical Computer Vision [51.370873913181605]
Self-Supervised Learning (SSL) methods have begun to gain traction in the general computer vision community.
The effectiveness of SSL methods in more complex and impactful domains, such as medicine and surgery, remains limited and unexplored.
We present an extensive analysis of the performance of these methods on the Cholec80 dataset for two fundamental and popular tasks in surgical context understanding, phase recognition and tool presence detection.
arXiv Detail & Related papers (2022-07-01T14:17:11Z) - Min-Max Similarity: A Contrastive Learning Based Semi-Supervised
Learning Network for Surgical Tools Segmentation [0.0]
We propose a semi-supervised segmentation network based on contrastive learning.
In contrast to the previous state-of-the-art, we introduce a contrastive learning form of dual-view training.
Our proposed method outperforms state-of-the-art semi-supervised and fully supervised segmentation algorithms consistently.
arXiv Detail & Related papers (2022-03-29T01:40:26Z) - CvS: Classification via Segmentation For Small Datasets [52.821178654631254]
This paper presents CvS, a cost-effective classifier for small datasets that derives the classification labels from predicting the segmentation maps.
We evaluate the effectiveness of our framework on diverse problems showing that CvS is able to achieve much higher classification results compared to previous methods when given only a handful of examples.
arXiv Detail & Related papers (2021-10-29T18:41:15Z) - Revisiting Contrastive Methods for Unsupervised Learning of Visual
Representations [78.12377360145078]
Contrastive self-supervised learning has outperformed supervised pretraining on many downstream tasks like segmentation and object detection.
In this paper, we first study how biases in the dataset affect existing methods.
We show that current contrastive approaches work surprisingly well across: (i) object- versus scene-centric, (ii) uniform versus long-tailed and (iii) general versus domain-specific datasets.
arXiv Detail & Related papers (2021-06-10T17:59:13Z) - LRTD: Long-Range Temporal Dependency based Active Learning for Surgical
Workflow Recognition [67.86810761677403]
We propose a novel active learning method for cost-effective surgical video analysis.
Specifically, we propose a non-local recurrent convolutional network (NL-RCNet), which introduces non-local block to capture the long-range temporal dependency.
We validate our approach on a large surgical video dataset (Cholec80) by performing surgical workflow recognition task.
arXiv Detail & Related papers (2020-04-21T09:21:22Z) - Learning Fast and Robust Target Models for Video Object Segmentation [83.3382606349118]
Video object segmentation (VOS) is a highly challenging problem since the initial mask, defining the target object, is only given at test-time.
Most previous approaches fine-tune segmentation networks on the first frame, resulting in impractical frame-rates and risk of overfitting.
We propose a novel VOS architecture consisting of two network components.
arXiv Detail & Related papers (2020-02-27T21:58:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.