A spatio-temporal network for video semantic segmentation in surgical
videos
- URL: http://arxiv.org/abs/2306.11052v1
- Date: Mon, 19 Jun 2023 16:36:48 GMT
- Title: A spatio-temporal network for video semantic segmentation in surgical
videos
- Authors: Maria Grammatikopoulou, Ricardo Sanchez-Matilla, Felix Bragman, David
Owen, Lucy Culshaw, Karen Kerr, Danail Stoyanov, Imanol Luengo
- Abstract summary: We propose a novel architecture for modelling temporal relationships in videos.
The proposed model includes a decoder to enable semantic video segmentation.
The proposed decoder can be used on top of any segmentation encoder to improve temporal consistency.
- Score: 11.548181453080087
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Semantic segmentation in surgical videos has applications in intra-operative
guidance, post-operative analytics and surgical education. Segmentation models
need to provide accurate and consistent predictions since temporally
inconsistent identification of anatomical structures can impair usability and
hinder patient safety. Video information can alleviate these challenges leading
to reliable models suitable for clinical use. We propose a novel architecture
for modelling temporal relationships in videos. The proposed model includes a
spatio-temporal decoder to enable video semantic segmentation by improving
temporal consistency across frames. The encoder processes individual frames
whilst the decoder processes a temporal batch of adjacent frames. The proposed
decoder can be used on top of any segmentation encoder to improve temporal
consistency. Model performance was evaluated on the CholecSeg8k dataset and a
private dataset of robotic Partial Nephrectomy procedures. Segmentation
performance was improved when the temporal decoder was applied across both
datasets. The proposed model also displayed improvements in temporal
consistency.
Related papers
- Appearance-Based Refinement for Object-Centric Motion Segmentation [85.2426540999329]
We introduce an appearance-based refinement method that leverages temporal consistency in video streams to correct inaccurate flow-based proposals.
Our approach involves a sequence-level selection mechanism that identifies accurate flow-predicted masks as exemplars.
Its performance is evaluated on multiple video segmentation benchmarks, including DAVIS, YouTube, SegTrackv2, and FBMS-59.
arXiv Detail & Related papers (2023-12-18T18:59:51Z) - Multi-grained Temporal Prototype Learning for Few-shot Video Object
Segmentation [156.4142424784322]
Few-Shot Video Object (FSVOS) aims to segment objects in a query video with the same category defined by a few annotated support images.
We propose to leverage multi-grained temporal guidance information for handling the temporal correlation nature of video data.
Our proposed video IPMT model significantly outperforms previous models on two benchmark datasets.
arXiv Detail & Related papers (2023-09-20T09:16:34Z) - Leaping Into Memories: Space-Time Deep Feature Synthesis [93.10032043225362]
We propose LEAPS, an architecture-independent method for synthesizing videos from internal models.
We quantitatively and qualitatively evaluate the applicability of LEAPS by inverting a range of architectures convolutional attention-based on Kinetics-400.
arXiv Detail & Related papers (2023-03-17T12:55:22Z) - Video-SwinUNet: Spatio-temporal Deep Learning Framework for VFSS
Instance Segmentation [10.789826145990016]
This paper presents a deep learning framework for medical video segmentation.
Our framework explicitly extracts features from neighbouring frames across the temporal dimension.
It incorporates them with a temporal feature blender, which then tokenises the high-level-temporal feature to form a strong global feature encoded via a Swin Transformer.
arXiv Detail & Related papers (2023-02-22T12:09:39Z) - Temporally Constrained Neural Networks (TCNN): A framework for
semi-supervised video semantic segmentation [5.0754434714665715]
We present Temporally Constrained Neural Networks (TCNN), a semi-supervised framework used for video semantic segmentation of surgical videos.
In this work, we show that autoencoder networks can be used to efficiently provide both spatial and temporal supervisory signals.
We demonstrate that lower-dimensional representations of predicted masks can be leveraged to provide a consistent improvement on both sparsely labeled datasets.
arXiv Detail & Related papers (2021-12-27T18:06:12Z) - Efficient Global-Local Memory for Real-time Instrument Segmentation of
Robotic Surgical Video [53.14186293442669]
We identify two important clues for surgical instrument perception, including local temporal dependency from adjacent frames and global semantic correlation in long-range duration.
We propose a novel dual-memory network (DMNet) to relate both global and local-temporal knowledge.
Our method largely outperforms the state-of-the-art works on segmentation accuracy while maintaining a real-time speed.
arXiv Detail & Related papers (2021-09-28T10:10:14Z) - Atrous Residual Interconnected Encoder to Attention Decoder Framework
for Vertebrae Segmentation via 3D Volumetric CT Images [1.8146155083014204]
This paper proposes a novel algorithm for automated vertebrae segmentation via 3D volumetric spine CT images.
The proposed model is based on the structure of encoder to decoder, using layer normalization to optimize mini-batch training performance.
The experimental results show that our model achieves competitive performance compared with other state-of-the-art medical semantic segmentation methods.
arXiv Detail & Related papers (2021-04-08T12:09:16Z) - Multi-frame Feature Aggregation for Real-time Instrument Segmentation in
Endoscopic Video [11.100734994959419]
We propose a novel Multi-frame Feature Aggregation (MFFA) module to aggregate video frame features temporally and spatially.
We also develop a method that can randomly synthesize a surgical frame sequence from a single labeled frame to assist network training.
arXiv Detail & Related papers (2020-11-17T16:27:27Z) - Coherent Loss: A Generic Framework for Stable Video Segmentation [103.78087255807482]
We investigate how a jittering artifact degrades the visual quality of video segmentation results.
We propose a Coherent Loss with a generic framework to enhance the performance of a neural network against jittering artifacts.
arXiv Detail & Related papers (2020-10-25T10:48:28Z) - Symmetric Dilated Convolution for Surgical Gesture Recognition [10.699258974625073]
We propose a novel temporal convolutional architecture to automatically detect and segment surgical gestures.
We devise our method with a symmetric dilation structure bridged by a self-attention module to encode and decode the long-term temporal patterns.
We validate our approach on a fundamental robotic suturing task from the JIGSAWS dataset.
arXiv Detail & Related papers (2020-07-13T13:34:48Z) - Efficient Semantic Video Segmentation with Per-frame Inference [117.97423110566963]
In this work, we process efficient semantic video segmentation in a per-frame fashion during the inference process.
We employ compact models for real-time execution. To narrow the performance gap between compact models and large models, new knowledge distillation methods are designed.
arXiv Detail & Related papers (2020-02-26T12:24:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.