Instrument-tissue Interaction Detection Framework for Surgical Video Understanding
- URL: http://arxiv.org/abs/2404.00322v1
- Date: Sat, 30 Mar 2024 11:21:11 GMT
- Title: Instrument-tissue Interaction Detection Framework for Surgical Video Understanding
- Authors: Wenjun Lin, Yan Hu, Huazhu Fu, Mingming Yang, Chin-Boon Chng, Ryo Kawasaki, Cheekong Chui, Jiang Liu,
- Abstract summary: We present an Instrument-Tissue Interaction Detection Network (ITIDNet) to detect the quintuple for surgery videos understanding.
Specifically, we propose a Snippet Consecutive Feature (SCF) Layer to enhance features by modeling relationships of proposals in the current frame using global context information in the video snippet.
To reason relationships between instruments and tissues, a Temporal Graph (TG) Layer is proposed with intra-frame connections to exploit relationships between instruments and tissues in the same frame and inter-frame connections to model the temporal information for the same instance.
- Score: 31.822025965225016
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Instrument-tissue interaction detection task, which helps understand surgical activities, is vital for constructing computer-assisted surgery systems but with many challenges. Firstly, most models represent instrument-tissue interaction in a coarse-grained way which only focuses on classification and lacks the ability to automatically detect instruments and tissues. Secondly, existing works do not fully consider relations between intra- and inter-frame of instruments and tissues. In the paper, we propose to represent instrument-tissue interaction as <instrument class, instrument bounding box, tissue class, tissue bounding box, action class> quintuple and present an Instrument-Tissue Interaction Detection Network (ITIDNet) to detect the quintuple for surgery videos understanding. Specifically, we propose a Snippet Consecutive Feature (SCF) Layer to enhance features by modeling relationships of proposals in the current frame using global context information in the video snippet. We also propose a Spatial Corresponding Attention (SCA) Layer to incorporate features of proposals between adjacent frames through spatial encoding. To reason relationships between instruments and tissues, a Temporal Graph (TG) Layer is proposed with intra-frame connections to exploit relationships between instruments and tissues in the same frame and inter-frame connections to model the temporal information for the same instance. For evaluation, we build a cataract surgery video (PhacoQ) dataset and a cholecystectomy surgery video (CholecQ) dataset. Experimental results demonstrate the promising performance of our model, which outperforms other state-of-the-art models on both datasets.
Related papers
- Data Augmentation for Surgical Scene Segmentation with Anatomy-Aware Diffusion Models [1.9085155846692308]
We introduce a multi-stage approach to generate multi-class surgical datasets with annotations.
Our framework improves anatomy awareness by training organ specific models with an inpainting objective guided by binary segmentation masks.
This versatile approach allows the generation of multi-class datasets from real binary datasets and simulated surgical masks.
arXiv Detail & Related papers (2024-10-10T09:29:23Z) - Exploring Optical Flow Inclusion into nnU-Net Framework for Surgical Instrument Segmentation [1.3444601218847545]
The nnU-Net framework excelled in semantic segmentation analyzing single frames without temporal information.
Optical flow (OF) is a tool commonly used in video tasks to estimate motion and represent it in a single frame, containing temporal information.
This work seeks to employ OF maps as an additional input to the nnU-Net architecture to improve its performance in the surgical instrument segmentation task.
arXiv Detail & Related papers (2024-03-15T11:36:26Z) - Video-Instrument Synergistic Network for Referring Video Instrument
Segmentation in Robotic Surgery [29.72271827272853]
This work explores a new task of Referring Surgical Video Instrument (RSVIS)
It aims to automatically identify and segment the corresponding surgical instruments based on the given language expression.
We devise a novel Video-Instrument Synergistic Network (VIS-Net) to learn both video-level and instrument-level knowledge to boost performance.
arXiv Detail & Related papers (2023-08-18T11:24:06Z) - Dynamic Interactive Relation Capturing via Scene Graph Learning for
Robotic Surgical Report Generation [14.711668177329244]
For robot-assisted surgery, an accurate surgical report reflects clinical operations during surgery and helps document entry tasks, post-operative analysis and follow-up treatment.
It is a challenging task due to many complex and diverse interactions between instruments and tissues in the surgical scene.
This paper presents a neural network to boost surgical report generation by explicitly exploring the interactive relation between tissues and surgical instruments.
arXiv Detail & Related papers (2023-06-05T07:34:41Z) - Surgical tool classification and localization: results and methods from
the MICCAI 2022 SurgToolLoc challenge [69.91670788430162]
We present the results of the SurgLoc 2022 challenge.
The goal was to leverage tool presence data as weak labels for machine learning models trained to detect tools.
We conclude by discussing these results in the broader context of machine learning and surgical data science.
arXiv Detail & Related papers (2023-05-11T21:44:39Z) - MURPHY: Relations Matter in Surgical Workflow Analysis [12.460554004034472]
This paper systematically investigates the importance of relational cues in surgery.
We contribute the RLLS12M dataset, a large-scale collection of robotic left lateral sectionectomy (RLLS)
We propose a multi-relation purification hybrid network (MURPHY), which aptly incorporates novel relation modules to augment the feature representation.
arXiv Detail & Related papers (2022-12-24T12:09:38Z) - TraSeTR: Track-to-Segment Transformer with Contrastive Query for
Instance-level Instrument Segmentation in Robotic Surgery [60.439434751619736]
We propose TraSeTR, a Track-to-Segment Transformer that exploits tracking cues to assist surgical instrument segmentation.
TraSeTR jointly reasons about the instrument type, location, and identity with instance-level predictions.
The effectiveness of our method is demonstrated with state-of-the-art instrument type segmentation results on three public datasets.
arXiv Detail & Related papers (2022-02-17T05:52:18Z) - Multimodal Semantic Scene Graphs for Holistic Modeling of Surgical
Procedures [70.69948035469467]
We take advantage of the latest computer vision methodologies for generating 3D graphs from camera views.
We then introduce the Multimodal Semantic Graph Scene (MSSG) which aims at providing unified symbolic and semantic representation of surgical procedures.
arXiv Detail & Related papers (2021-06-09T14:35:44Z) - Learning Asynchronous and Sparse Human-Object Interaction in Videos [56.73059840294019]
Asynchronous-Sparse Interaction Graph Networks (ASSIGN) is able to automatically detect the structure of interaction events associated with entities in a video scene.
ASSIGN is tested on human-object interaction recognition and shows superior performance in segmenting and labeling of human sub-activities and object affordances from raw videos.
arXiv Detail & Related papers (2021-03-03T23:43:55Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - Automatic Gesture Recognition in Robot-assisted Surgery with
Reinforcement Learning and Tree Search [63.07088785532908]
We propose a framework based on reinforcement learning and tree search for joint surgical gesture segmentation and classification.
Our framework consistently outperforms the existing methods on the suturing task of JIGSAWS dataset in terms of accuracy, edit score and F1 score.
arXiv Detail & Related papers (2020-02-20T13:12:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.