Robotic Scene Segmentation with Memory Network for Runtime Surgical
Context Inference
- URL: http://arxiv.org/abs/2308.12789v1
- Date: Thu, 24 Aug 2023 13:44:55 GMT
- Title: Robotic Scene Segmentation with Memory Network for Runtime Surgical
Context Inference
- Authors: Zongyu Li, Ian Reyes, Homa Alemzadeh
- Abstract summary: Space Time Correspondence Network (STCN) is a memory network that performs binary segmentation and minimizes the effects of class imbalance.
We show that STCN achieves superior segmentation performance for objects that are difficult to segment, such as needle and thread.
We also demonstrate that segmentation and context inference can be performed at runtime without compromising performance.
- Score: 8.600278838838163
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Surgical context inference has recently garnered significant attention in
robot-assisted surgery as it can facilitate workflow analysis, skill
assessment, and error detection. However, runtime context inference is
challenging since it requires timely and accurate detection of the interactions
among the tools and objects in the surgical scene based on the segmentation of
video data. On the other hand, existing state-of-the-art video segmentation
methods are often biased against infrequent classes and fail to provide
temporal consistency for segmented masks. This can negatively impact the
context inference and accurate detection of critical states. In this study, we
propose a solution to these challenges using a Space Time Correspondence
Network (STCN). STCN is a memory network that performs binary segmentation and
minimizes the effects of class imbalance. The use of a memory bank in STCN
allows for the utilization of past image and segmentation information, thereby
ensuring consistency of the masks. Our experiments using the publicly available
JIGSAWS dataset demonstrate that STCN achieves superior segmentation
performance for objects that are difficult to segment, such as needle and
thread, and improves context inference compared to the state-of-the-art. We
also demonstrate that segmentation and context inference can be performed at
runtime without compromising performance.
Related papers
- Disentangling spatio-temporal knowledge for weakly supervised object detection and segmentation in surgical video [10.287675722826028]
This paper introduces Video Spatio-Temporal Disment Networks (VDST-Net) to disentangle information using semi-decoupled temporal knowledge distillation to predict high-quality class activation maps (CAMs)
We demonstrate the efficacy of our framework on a public reference dataset and on a more challenging surgical video dataset where objects are, on average, present in less than 60% of annotated frames.
arXiv Detail & Related papers (2024-07-22T16:52:32Z) - OoDIS: Anomaly Instance Segmentation Benchmark [57.89836988990543]
We extend the most commonly used anomaly segmentation benchmarks to include the instance segmentation task.
Development in this area has been lagging, largely due to the lack of dedicated benchmarks.
Our evaluation of anomaly instance segmentation methods shows that this challenge remains an unsolved problem.
arXiv Detail & Related papers (2024-06-17T17:59:56Z) - Visual Context-Aware Person Fall Detection [52.49277799455569]
We present a segmentation pipeline to semi-automatically separate individuals and objects in images.
Background objects such as beds, chairs, or wheelchairs can challenge fall detection systems, leading to false positive alarms.
We demonstrate that object-specific contextual transformations during training effectively mitigate this challenge.
arXiv Detail & Related papers (2024-04-11T19:06:36Z) - RISeg: Robot Interactive Object Segmentation via Body Frame-Invariant
Features [6.358423536732677]
We introduce a novel approach to correct inaccurate segmentation by using robot interaction and a designed body frame-invariant feature.
We demonstrate the effectiveness of our proposed interactive perception pipeline in accurately segmenting cluttered scenes by achieving an average object segmentation accuracy rate of 80.7%.
arXiv Detail & Related papers (2024-03-04T05:03:24Z) - PWISeg: Point-based Weakly-supervised Instance Segmentation for Surgical
Instruments [27.89003436883652]
We propose a weakly-supervised surgical instrument segmentation approach, named Point-based Weakly-supervised Instance (PWISeg)
PWISeg adopts an FCN-based architecture with point-to-box and point-to-mask branches to model the relationships between feature points and bounding boxes.
Based on this, we propose a key pixel association loss and a key pixel distribution loss, driving the point-to-mask branch to generate more accurate segmentation predictions.
arXiv Detail & Related papers (2023-11-16T11:48:29Z) - Towards Surgical Context Inference and Translation to Gestures [1.858151490268935]
Manual labeling of gestures in robot-assisted surgery is labor intensive, prone to errors, and requires expertise or training.
We propose a method for automated and explainable generation of gesture transcripts.
arXiv Detail & Related papers (2023-02-28T01:39:36Z) - TraSeTR: Track-to-Segment Transformer with Contrastive Query for
Instance-level Instrument Segmentation in Robotic Surgery [60.439434751619736]
We propose TraSeTR, a Track-to-Segment Transformer that exploits tracking cues to assist surgical instrument segmentation.
TraSeTR jointly reasons about the instrument type, location, and identity with instance-level predictions.
The effectiveness of our method is demonstrated with state-of-the-art instrument type segmentation results on three public datasets.
arXiv Detail & Related papers (2022-02-17T05:52:18Z) - Temporally Constrained Neural Networks (TCNN): A framework for
semi-supervised video semantic segmentation [5.0754434714665715]
We present Temporally Constrained Neural Networks (TCNN), a semi-supervised framework used for video semantic segmentation of surgical videos.
In this work, we show that autoencoder networks can be used to efficiently provide both spatial and temporal supervisory signals.
We demonstrate that lower-dimensional representations of predicted masks can be leveraged to provide a consistent improvement on both sparsely labeled datasets.
arXiv Detail & Related papers (2021-12-27T18:06:12Z) - RICE: Refining Instance Masks in Cluttered Environments with Graph
Neural Networks [53.15260967235835]
We propose a novel framework that refines the output of such methods by utilizing a graph-based representation of instance masks.
We train deep networks capable of sampling smart perturbations to the segmentations, and a graph neural network, which can encode relations between objects, to evaluate the segmentations.
We demonstrate an application that uses uncertainty estimates generated by our method to guide a manipulator, leading to efficient understanding of cluttered scenes.
arXiv Detail & Related papers (2021-06-29T20:29:29Z) - SegmentMeIfYouCan: A Benchmark for Anomaly Segmentation [111.61261419566908]
Deep neural networks (DNNs) are usually trained on a closed set of semantic classes.
They are ill-equipped to handle previously-unseen objects.
detecting and localizing such objects is crucial for safety-critical applications such as perception for automated driving.
arXiv Detail & Related papers (2021-04-30T07:58:19Z) - Target-Aware Object Discovery and Association for Unsupervised Video
Multi-Object Segmentation [79.6596425920849]
This paper addresses the task of unsupervised video multi-object segmentation.
We introduce a novel approach for more accurate and efficient unseen-temporal segmentation.
We evaluate the proposed approach on DAVIS$_17$ and YouTube-VIS, and the results demonstrate that it outperforms state-of-the-art methods both in segmentation accuracy and inference speed.
arXiv Detail & Related papers (2021-04-10T14:39:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.