Efficient Global-Local Memory for Real-time Instrument Segmentation of
Robotic Surgical Video
- URL: http://arxiv.org/abs/2109.13593v1
- Date: Tue, 28 Sep 2021 10:10:14 GMT
- Title: Efficient Global-Local Memory for Real-time Instrument Segmentation of
Robotic Surgical Video
- Authors: Jiacheng Wang, Yueming Jin, Liansheng Wang, Shuntian Cai, Pheng-Ann
Heng, Jing Qin
- Abstract summary: We identify two important clues for surgical instrument perception, including local temporal dependency from adjacent frames and global semantic correlation in long-range duration.
We propose a novel dual-memory network (DMNet) to relate both global and local-temporal knowledge.
Our method largely outperforms the state-of-the-art works on segmentation accuracy while maintaining a real-time speed.
- Score: 53.14186293442669
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Performing a real-time and accurate instrument segmentation from videos is of
great significance for improving the performance of robotic-assisted surgery.
We identify two important clues for surgical instrument perception, including
local temporal dependency from adjacent frames and global semantic correlation
in long-range duration. However, most existing works perform segmentation
purely using visual cues in a single frame. Optical flow is just used to model
the motion between only two frames and brings heavy computational cost. We
propose a novel dual-memory network (DMNet) to wisely relate both global and
local spatio-temporal knowledge to augment the current features, boosting the
segmentation performance and retaining the real-time prediction capability. We
propose, on the one hand, an efficient local memory by taking the complementary
advantages of convolutional LSTM and non-local mechanisms towards the relating
reception field. On the other hand, we develop an active global memory to
gather the global semantic correlation in long temporal range to current one,
in which we gather the most informative frames derived from model uncertainty
and frame similarity. We have extensively validated our method on two public
benchmark surgical video datasets. Experimental results demonstrate that our
method largely outperforms the state-of-the-art works on segmentation accuracy
while maintaining a real-time speed.
Related papers
- Video-SwinUNet: Spatio-temporal Deep Learning Framework for VFSS
Instance Segmentation [10.789826145990016]
This paper presents a deep learning framework for medical video segmentation.
Our framework explicitly extracts features from neighbouring frames across the temporal dimension.
It incorporates them with a temporal feature blender, which then tokenises the high-level-temporal feature to form a strong global feature encoded via a Swin Transformer.
arXiv Detail & Related papers (2023-02-22T12:09:39Z) - Distortion-Aware Network Pruning and Feature Reuse for Real-time Video
Segmentation [49.17930380106643]
We propose a novel framework to speed up any architecture with skip-connections for real-time vision tasks.
Specifically, at the arrival of each frame, we transform the features from the previous frame to reuse them at specific spatial bins.
We then perform partial computation of the backbone network on the regions of the current frame that captures temporal differences between the current and previous frame.
arXiv Detail & Related papers (2022-06-20T07:20:02Z) - Exploring Intra- and Inter-Video Relation for Surgical Semantic Scene
Segmentation [58.74791043631219]
We propose a novel framework STswinCL that explores the complementary intra- and inter-video relations to boost segmentation performance.
We extensively validate our approach on two public surgical video benchmarks, including EndoVis18 Challenge and CaDIS dataset.
Experimental results demonstrate the promising performance of our method, which consistently exceeds previous state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-29T05:52:23Z) - Temporal Memory Relation Network for Workflow Recognition from Surgical
Video [53.20825496640025]
We propose a novel end-to-end temporal memory relation network (TMNet) for relating long-range and multi-scale temporal patterns.
We have extensively validated our approach on two benchmark surgical video datasets.
arXiv Detail & Related papers (2021-03-30T13:20:26Z) - Domain Adaptive Robotic Gesture Recognition with Unsupervised
Kinematic-Visual Data Alignment [60.31418655784291]
We propose a novel unsupervised domain adaptation framework which can simultaneously transfer multi-modality knowledge, i.e., both kinematic and visual data, from simulator to real robot.
It remedies the domain gap with enhanced transferable features by using temporal cues in videos, and inherent correlations in multi-modal towards recognizing gesture.
Results show that our approach recovers the performance with great improvement gains, up to 12.91% in ACC and 20.16% in F1score without using any annotations in real robot.
arXiv Detail & Related papers (2021-03-06T09:10:03Z) - Memory Group Sampling Based Online Action Recognition Using Kinetic
Skeleton Features [4.674689979981502]
We propose two core ideas to handle the online action recognition problem.
First, we combine the spatial and temporal skeleton features to depict the actions.
Second, we propose a memory group sampling method to combine the previous action frames and current action frames.
Third, an improved 1D CNN network is employed for training and testing using the features from sampled frames.
arXiv Detail & Related papers (2020-11-01T16:43:08Z) - Learning Motion Flows for Semi-supervised Instrument Segmentation from
Robotic Surgical Video [64.44583693846751]
We study the semi-supervised instrument segmentation from robotic surgical videos with sparse annotations.
By exploiting generated data pairs, our framework can recover and even enhance temporal consistency of training sequences.
Results show that our method outperforms the state-of-the-art semisupervised methods by a large margin.
arXiv Detail & Related papers (2020-07-06T02:39:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.