Real-time Instance Segmentation of Surgical Instruments using Attention
and Multi-scale Feature Fusion
- URL: http://arxiv.org/abs/2111.04911v2
- Date: Wed, 10 Nov 2021 03:08:59 GMT
- Title: Real-time Instance Segmentation of Surgical Instruments using Attention
and Multi-scale Feature Fusion
- Authors: Juan Carlos Angeles-Ceron, Gilberto Ochoa-Ruiz, Leonardo Chang, Sharib
Ali
- Abstract summary: Deep learning gives us the opportunity to learn complex environment from large surgery scene environments.
In this paper, we use a light-weight single stage segmentation model complemented with a convolutional block attention module.
Our approach out-performed top team performances in the ROBUST-MIS challenge with over 44% improvement on both area-based metric MI_DSC and distance-based metric MI_NSD.
- Score: 0.5735035463793008
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Precise instrument segmentation aid surgeons to navigate the body more easily
and increase patient safety. While accurate tracking of surgical instruments in
real-time plays a crucial role in minimally invasive computer-assisted
surgeries, it is a challenging task to achieve, mainly due to 1) complex
surgical environment, and 2) model design with both optimal accuracy and speed.
Deep learning gives us the opportunity to learn complex environment from large
surgery scene environments and placements of these instruments in real world
scenarios. The Robust Medical Instrument Segmentation 2019 challenge
(ROBUST-MIS) provides more than 10,000 frames with surgical tools in different
clinical settings. In this paper, we use a light-weight single stage instance
segmentation model complemented with a convolutional block attention module for
achieving both faster and accurate inference. We further improve accuracy
through data augmentation and optimal anchor localisation strategies. To our
knowledge, this is the first work that explicitly focuses on both real-time
performance and improved accuracy. Our approach out-performed top team
performances in the ROBUST-MIS challenge with over 44% improvement on both
area-based metric MI_DSC and distance-based metric MI_NSD. We also demonstrate
real-time performance (> 60 frames-per-second) with different but competitive
variants of our final approach.
Related papers
- Phase-Specific Augmented Reality Guidance for Microscopic Cataract
Surgery Using Long-Short Spatiotemporal Aggregation Transformer [14.568834378003707]
Phaemulsification cataract surgery (PCS) is a routine procedure using a surgical microscope.
PCS guidance systems extract valuable information from surgical microscopic videos to enhance proficiency.
Existing PCS guidance systems suffer from non-phasespecific guidance, leading to redundant visual information.
We propose a novel phase-specific augmented reality (AR) guidance system, which offers tailored AR information corresponding to the recognized surgical phase.
arXiv Detail & Related papers (2023-09-11T02:56:56Z) - Visual-Kinematics Graph Learning for Procedure-agnostic Instrument Tip
Segmentation in Robotic Surgeries [29.201385352740555]
We propose a novel visual-kinematics graph learning framework to accurately segment the instrument tip given various surgical procedures.
Specifically, a graph learning framework is proposed to encode relational features of instrument parts from both image and kinematics.
A cross-modal contrastive loss is designed to incorporate robust geometric prior from kinematics to image for tip segmentation.
arXiv Detail & Related papers (2023-09-02T14:52:58Z) - GLSFormer : Gated - Long, Short Sequence Transformer for Step
Recognition in Surgical Videos [57.93194315839009]
We propose a vision transformer-based approach to learn temporal features directly from sequence-level patches.
We extensively evaluate our approach on two cataract surgery video datasets, Cataract-101 and D99, and demonstrate superior performance compared to various state-of-the-art methods.
arXiv Detail & Related papers (2023-07-20T17:57:04Z) - LoViT: Long Video Transformer for Surgical Phase Recognition [59.06812739441785]
We present a two-stage method, called Long Video Transformer (LoViT) for fusing short- and long-term temporal information.
Our approach outperforms state-of-the-art methods on the Cholec80 and AutoLaparo datasets consistently.
arXiv Detail & Related papers (2023-05-15T20:06:14Z) - Dissecting Self-Supervised Learning Methods for Surgical Computer Vision [51.370873913181605]
Self-Supervised Learning (SSL) methods have begun to gain traction in the general computer vision community.
The effectiveness of SSL methods in more complex and impactful domains, such as medicine and surgery, remains limited and unexplored.
We present an extensive analysis of the performance of these methods on the Cholec80 dataset for two fundamental and popular tasks in surgical context understanding, phase recognition and tool presence detection.
arXiv Detail & Related papers (2022-07-01T14:17:11Z) - ST-MTL: Spatio-Temporal Multitask Learning Model to Predict Scanpath
While Tracking Instruments in Robotic Surgery [14.47768738295518]
Learning of the task-oriented attention while tracking instrument holds vast potential in image-guided robotic surgery.
We propose an end-to-end Multi-Task Learning (ST-MTL) model with a shared encoder and Sink-temporal decoders for the real-time surgical instrument segmentation and task-oriented saliency detection.
We tackle the problem with a novel asynchronous-temporal optimization technique by calculating independent gradients for each decoder.
Compared to the state-of-the-art segmentation and saliency methods, our model most outperforms the evaluation metrics and produces an outstanding performance in challenge
arXiv Detail & Related papers (2021-12-10T15:20:27Z) - Efficient Global-Local Memory for Real-time Instrument Segmentation of
Robotic Surgical Video [53.14186293442669]
We identify two important clues for surgical instrument perception, including local temporal dependency from adjacent frames and global semantic correlation in long-range duration.
We propose a novel dual-memory network (DMNet) to relate both global and local-temporal knowledge.
Our method largely outperforms the state-of-the-art works on segmentation accuracy while maintaining a real-time speed.
arXiv Detail & Related papers (2021-09-28T10:10:14Z) - Assessing YOLACT++ for real time and robust instance segmentation of
medical instruments in endoscopic procedures [0.5735035463793008]
Image-based tracking of laparoscopic instruments plays a fundamental role in computer and robotic-assisted surgeries.
To date, most of the existing models for instance segmentation of medical instruments were based on two-stage detectors.
We propose the addition of attention mechanisms to the YOLACT architecture that allows real-time instance segmentation of instruments.
arXiv Detail & Related papers (2021-03-30T00:09:55Z) - One to Many: Adaptive Instrument Segmentation via Meta Learning and
Dynamic Online Adaptation in Robotic Surgical Video [71.43912903508765]
MDAL is a dynamic online adaptive learning scheme for instrument segmentation in robot-assisted surgery.
It learns the general knowledge of instruments and the fast adaptation ability through the video-specific meta-learning paradigm.
It outperforms other state-of-the-art methods on two datasets.
arXiv Detail & Related papers (2021-03-24T05:02:18Z) - Searching for Efficient Architecture for Instrument Segmentation in
Robotic Surgery [58.63306322525082]
Most applications rely on accurate real-time segmentation of high-resolution surgical images.
We design a light-weight and highly-efficient deep residual architecture which is tuned to perform real-time inference of high-resolution images.
arXiv Detail & Related papers (2020-07-08T21:38:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.