A Novel Deep ML Architecture by Integrating Visual Simultaneous
Localization and Mapping (vSLAM) into Mask R-CNN for Real-time Surgical Video
Analysis
- URL: http://arxiv.org/abs/2103.16847v1
- Date: Wed, 31 Mar 2021 06:59:13 GMT
- Title: A Novel Deep ML Architecture by Integrating Visual Simultaneous
Localization and Mapping (vSLAM) into Mask R-CNN for Real-time Surgical Video
Analysis
- Authors: Ella Selina Lan
- Abstract summary: In this research, a novel machine learning architecture, RPM-CNN, is created to perform real-time surgical analysis.
RPM-CNN integrates visual simultaneous localization and mapping (vSLAM) into Mask R-CNN.
To apply RPM-CNN's real-time top performance to the real world, a Microsoft HoloLens 2 application is developed.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Seven million people suffer complications after surgery each year. With
sufficient surgical training and feedback, half of these complications could be
prevented. Automatic surgical video analysis, especially for minimally invasive
surgery, plays a key role in training and review, with increasing interests
from recent studies on tool and workflow detection. In this research, a novel
machine learning architecture, RPM-CNN, is created to perform real-time
surgical video analysis. This architecture, for the first time, integrates
visual simultaneous localization and mapping (vSLAM) into Mask R-CNN.
Spatio-temporal information, in addition to the visual features, is utilized to
increase the accuracy to 96.8 mAP for tool detection and 97.5 mean Jaccard for
workflow detection, surpassing all previous works via the same benchmark
dataset. As a real-time prediction, the RPM-CNN model reaches a 50 FPS runtime
performance speed, 10x faster than region based CNN, by modeling the
spatio-temporal information directly from surgical videos during the vSLAM 3D
mapping. Additionally, this novel Region Proposal Module (RPM) replaces the
region proposal network (RPN) in Mask R-CNN, accurately placing bounding-boxes
and lessening the annotation requirement. In principle, this architecture
integrates the best of both worlds, inclusive of 1) vSLAM on object detection,
through focusing on geometric information for region proposals and 2) CNN on
object recognition, through focusing on semantic information for image
classification; the integration of these two technologies into one joint
training process opens a new door in computer vision. Furthermore, to apply
RPM-CNN's real-time top performance to the real world, a Microsoft HoloLens 2
application is developed to provide an augmented reality (AR) based solution
for both surgical training and assistance.
Related papers
- Open-Vocabulary Spatio-Temporal Action Detection [59.91046192096296]
Open-vocabulary-temporal action detection (OV-STAD) is an important fine-grained video understanding task.
OV-STAD requires training a model on a limited set of base classes with box and label supervision.
To better adapt the holistic VLM for the fine-grained action detection task, we carefully fine-tune it on the localized video region-text pairs.
arXiv Detail & Related papers (2024-05-17T14:52:47Z) - DNS SLAM: Dense Neural Semantic-Informed SLAM [92.39687553022605]
DNS SLAM is a novel neural RGB-D semantic SLAM approach featuring a hybrid representation.
Our method integrates multi-view geometry constraints with image-based feature extraction to improve appearance details.
Our experimental results achieve state-of-the-art performance on both synthetic data and real-world data tracking.
arXiv Detail & Related papers (2023-11-30T21:34:44Z) - SeMLaPS: Real-time Semantic Mapping with Latent Prior Networks and
Quasi-Planar Segmentation [53.83313235792596]
We present a new methodology for real-time semantic mapping from RGB-D sequences.
It combines a 2D neural network and a 3D network based on a SLAM system with 3D occupancy mapping.
Our system achieves state-of-the-art semantic mapping quality within 2D-3D networks-based systems.
arXiv Detail & Related papers (2023-06-28T22:36:44Z) - An Acceleration Method Based on Deep Learning and Multilinear Feature
Space [0.0]
This paper presents an alternative approach based on the Multilinear Feature Space (MFS) method resorting to transfer learning from large CNN architectures.
The proposed method uses CNNs to generate feature maps, although it does not work as complexity reduction approach.
Our method, named AMFC, uses the transfer learning from pre-trained CNN to reduce the classification time of new sample image, with minimal accuracy loss.
arXiv Detail & Related papers (2021-10-16T23:49:12Z) - Efficient Global-Local Memory for Real-time Instrument Segmentation of
Robotic Surgical Video [53.14186293442669]
We identify two important clues for surgical instrument perception, including local temporal dependency from adjacent frames and global semantic correlation in long-range duration.
We propose a novel dual-memory network (DMNet) to relate both global and local-temporal knowledge.
Our method largely outperforms the state-of-the-art works on segmentation accuracy while maintaining a real-time speed.
arXiv Detail & Related papers (2021-09-28T10:10:14Z) - Dynamic Gesture Recognition [0.0]
It is possible to use machine learning to classify images and/or videos instead of the traditional computer vision algorithms.
The aim of this project is to builda symbiosis between a convolutional neural network (CNN) and a recurrent neural network (RNN)
arXiv Detail & Related papers (2021-09-20T09:45:29Z) - SurgeonAssist-Net: Towards Context-Aware Head-Mounted Display-Based
Augmented Reality for Surgical Guidance [18.060445966264727]
SurgeonAssist-Net is a framework making action-and-workflow-driven virtual assistance accessible to commercially available optical see-through head-mounted displays (OST-HMDs)
Our implementation competes with state-of-the-art approaches in prediction accuracy for automated task recognition.
It is capable of near real-time performance on the Microsoft HoloLens 2 OST-HMD.
arXiv Detail & Related papers (2021-07-13T21:12:34Z) - Spatial-Temporal Correlation and Topology Learning for Person
Re-Identification in Videos [78.45050529204701]
We propose a novel framework to pursue discriminative and robust representation by modeling cross-scale spatial-temporal correlation.
CTL utilizes a CNN backbone and a key-points estimator to extract semantic local features from human body.
It explores a context-reinforced topology to construct multi-scale graphs by considering both global contextual information and physical connections of human body.
arXiv Detail & Related papers (2021-04-15T14:32:12Z) - One to Many: Adaptive Instrument Segmentation via Meta Learning and
Dynamic Online Adaptation in Robotic Surgical Video [71.43912903508765]
MDAL is a dynamic online adaptive learning scheme for instrument segmentation in robot-assisted surgery.
It learns the general knowledge of instruments and the fast adaptation ability through the video-specific meta-learning paradigm.
It outperforms other state-of-the-art methods on two datasets.
arXiv Detail & Related papers (2021-03-24T05:02:18Z) - Detection and Localization of Robotic Tools in Robot-Assisted Surgery
Videos Using Deep Neural Networks for Region Proposal and Detection [30.042965489804356]
We propose a solution to the tool detection and localization open problem in RAS video understanding.
We propose an architecture using multimodal convolutional neural networks for fast detection and localization of tools in RAS videos.
Our results with an Average Precision (AP) of 91% and a mean time of 0.1 seconds per test frame detection indicate that our study is superior to conventionally used methods for medical imaging.
arXiv Detail & Related papers (2020-07-29T10:59:15Z) - Accurate Tumor Tissue Region Detection with Accelerated Deep
Convolutional Neural Networks [12.7414209590152]
Manual annotation of pathology slides for cancer diagnosis is laborious and repetitive.
Our approach, (FLASH) is based on a Deep Convolutional Neural Network (DCNN) architecture.
It reduces computational costs and is faster than typical deep learning approaches by two orders of magnitude.
arXiv Detail & Related papers (2020-04-18T08:24:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.