AP-MTL: Attention Pruned Multi-task Learning Model for Real-time
Instrument Detection and Segmentation in Robot-assisted Surgery
- URL: http://arxiv.org/abs/2003.04769v2
- Date: Sun, 31 May 2020 12:30:42 GMT
- Title: AP-MTL: Attention Pruned Multi-task Learning Model for Real-time
Instrument Detection and Segmentation in Robot-assisted Surgery
- Authors: Mobarakol Islam, Vibashan VS, Hongliang Ren
- Abstract summary: Training a real-time robotic system for the detection and segmentation of high-resolution images provides a challenging problem with the limited computational resource.
We develop a novel end-to-end trainable real-time Multi-Task Learning model with weight-shared encoder and task-aware detection and segmentation decoders.
Our model significantly outperforms state-of-the-art segmentation and detection models, including best-performed models in the challenge.
- Score: 23.33984309289549
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Surgical scene understanding and multi-tasking learning are crucial for
image-guided robotic surgery. Training a real-time robotic system for the
detection and segmentation of high-resolution images provides a challenging
problem with the limited computational resource. The perception drawn can be
applied in effective real-time feedback, surgical skill assessment, and
human-robot collaborative surgeries to enhance surgical outcomes. For this
purpose, we develop a novel end-to-end trainable real-time Multi-Task Learning
(MTL) model with weight-shared encoder and task-aware detection and
segmentation decoders. Optimization of multiple tasks at the same convergence
point is vital and presents a complex problem. Thus, we propose an asynchronous
task-aware optimization (ATO) technique to calculate task-oriented gradients
and train the decoders independently. Moreover, MTL models are always
computationally expensive, which hinder real-time applications. To address this
challenge, we introduce a global attention dynamic pruning (GADP) by removing
less significant and sparse parameters. We further design a skip squeeze and
excitation (SE) module, which suppresses weak features, excites significant
features and performs dynamic spatial and channel-wise feature re-calibration.
Validating on the robotic instrument segmentation dataset of MICCAI endoscopic
vision challenge, our model significantly outperforms state-of-the-art
segmentation and detection models, including best-performed models in the
challenge.
Related papers
- SEDMamba: Enhancing Selective State Space Modelling with Bottleneck Mechanism and Fine-to-Coarse Temporal Fusion for Efficient Error Detection in Robot-Assisted Surgery [7.863539113283565]
We propose a novel hierarchical model named SEDMamba, which incorporates the selective state space model (SSM) into surgical error detection.
SEDMamba enhances selective SSM with a bottleneck mechanism and fine-to-coarse temporal fusion (FCTF) to detect and temporally localize surgical errors in long videos.
Our work also contributes the first-of-its-kind, frame-level, in-vivo surgical error dataset to support error detection in real surgical cases.
arXiv Detail & Related papers (2024-06-22T19:20:35Z) - CViT: Continuous Vision Transformer for Operator Learning [24.1795082775376]
Continuous Vision Transformer (CViT) is a novel neural operator architecture that leverages advances in computer vision to address challenges in learning complex physical systems.
CViT combines a vision transformer encoder, a novel grid-based coordinate embedding, and a query-wise cross-attention mechanism to effectively capture multi-scale dependencies.
We demonstrate CViT's effectiveness across a diverse range of partial differential equation (PDE) systems, including fluid dynamics, climate modeling, and reaction-diffusion processes.
arXiv Detail & Related papers (2024-05-22T21:13:23Z) - Robotic Navigation Autonomy for Subretinal Injection via Intelligent
Real-Time Virtual iOCT Volume Slicing [88.99939660183881]
We propose a framework for autonomous robotic navigation for subretinal injection.
Our method consists of an instrument pose estimation method, an online registration between the robotic and the i OCT system, and trajectory planning tailored for navigation to an injection target.
Our experiments on ex-vivo porcine eyes demonstrate the precision and repeatability of the method.
arXiv Detail & Related papers (2023-01-17T21:41:21Z) - Task-Aware Asynchronous Multi-Task Model with Class Incremental
Contrastive Learning for Surgical Scene Understanding [17.80234074699157]
A multi-task learning model is proposed for surgical report generation and tool-tissue interaction prediction.
The model forms of shared feature extractor, mesh-transformer branch for captioning and graph attention branch for tool-tissue interaction prediction.
We incorporate a task-aware asynchronous MTL optimization technique to fine-tune the shared weights and converge both tasks optimally.
arXiv Detail & Related papers (2022-11-28T14:08:48Z) - ST-MTL: Spatio-Temporal Multitask Learning Model to Predict Scanpath
While Tracking Instruments in Robotic Surgery [14.47768738295518]
Learning of the task-oriented attention while tracking instrument holds vast potential in image-guided robotic surgery.
We propose an end-to-end Multi-Task Learning (ST-MTL) model with a shared encoder and Sink-temporal decoders for the real-time surgical instrument segmentation and task-oriented saliency detection.
We tackle the problem with a novel asynchronous-temporal optimization technique by calculating independent gradients for each decoder.
Compared to the state-of-the-art segmentation and saliency methods, our model most outperforms the evaluation metrics and produces an outstanding performance in challenge
arXiv Detail & Related papers (2021-12-10T15:20:27Z) - Improved Speech Emotion Recognition using Transfer Learning and
Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction.
One of the main challenges in SER is data scarcity.
We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z) - Domain Adaptive Robotic Gesture Recognition with Unsupervised
Kinematic-Visual Data Alignment [60.31418655784291]
We propose a novel unsupervised domain adaptation framework which can simultaneously transfer multi-modality knowledge, i.e., both kinematic and visual data, from simulator to real robot.
It remedies the domain gap with enhanced transferable features by using temporal cues in videos, and inherent correlations in multi-modal towards recognizing gesture.
Results show that our approach recovers the performance with great improvement gains, up to 12.91% in ACC and 20.16% in F1score without using any annotations in real robot.
arXiv Detail & Related papers (2021-03-06T09:10:03Z) - Interpretable Hyperspectral AI: When Non-Convex Modeling meets
Hyperspectral Remote Sensing [57.52865154829273]
Hyperspectral imaging, also known as image spectrometry, is a landmark technique in geoscience remote sensing (RS)
In the past decade efforts have been made to process analyze these hyperspectral (HS) products mainly by means of seasoned experts.
For this reason, it is urgent to develop more intelligent and automatic approaches for various HS RS applications.
arXiv Detail & Related papers (2021-03-02T03:32:10Z) - Progressive Self-Guided Loss for Salient Object Detection [102.35488902433896]
We present a progressive self-guided loss function to facilitate deep learning-based salient object detection in images.
Our framework takes advantage of adaptively aggregated multi-scale features to locate and detect salient objects effectively.
arXiv Detail & Related papers (2021-01-07T07:33:38Z) - Real-Time Instrument Segmentation in Robotic Surgery using Auxiliary
Supervised Deep Adversarial Learning [15.490603884631764]
Real-time semantic segmentation of the robotic instruments and tissues is a crucial step in robot-assisted surgery.
We have developed a light-weight cascaded convolutional neural network (CNN) to segment the surgical instruments from high-resolution videos.
We show that our model surpasses existing algorithms for pixel-wise segmentation of surgical instruments in both prediction accuracy and segmentation time of high-resolution videos.
arXiv Detail & Related papers (2020-07-22T10:16:07Z) - A Unified Object Motion and Affinity Model for Online Multi-Object
Tracking [127.5229859255719]
We propose a novel MOT framework that unifies object motion and affinity model into a single network, named UMA.
UMA integrates single object tracking and metric learning into a unified triplet network by means of multi-task learning.
We equip our model with a task-specific attention module, which is used to boost task-aware feature learning.
arXiv Detail & Related papers (2020-03-25T09:36:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.