Multimodal Semantic Scene Graphs for Holistic Modeling of Surgical
Procedures
- URL: http://arxiv.org/abs/2106.15309v1
- Date: Wed, 9 Jun 2021 14:35:44 GMT
- Title: Multimodal Semantic Scene Graphs for Holistic Modeling of Surgical
Procedures
- Authors: Ege \"Ozsoy, Evin P{\i}nar \"Ornek, Ulrich Eck, Federico Tombari,
Nassir Navab
- Abstract summary: We take advantage of the latest computer vision methodologies for generating 3D graphs from camera views.
We then introduce the Multimodal Semantic Graph Scene (MSSG) which aims at providing unified symbolic and semantic representation of surgical procedures.
- Score: 70.69948035469467
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: From a computer science viewpoint, a surgical domain model needs to be a
conceptual one incorporating both behavior and data. It should therefore model
actors, devices, tools, their complex interactions and data flow. To capture
and model these, we take advantage of the latest computer vision methodologies
for generating 3D scene graphs from camera views. We then introduce the
Multimodal Semantic Scene Graph (MSSG) which aims at providing a unified
symbolic, spatiotemporal and semantic representation of surgical procedures.
This methodology aims at modeling the relationship between different components
in surgical domain including medical staff, imaging systems, and surgical
devices, opening the path towards holistic understanding and modeling of
surgical procedures. We then use MSSG to introduce a dynamically generated
graphical user interface tool for surgical procedure analysis which could be
used for many applications including process optimization, OR design and
automatic report generation. We finally demonstrate that the proposed MSSGs
could also be used for synchronizing different complex surgical procedures.
While the system still needs to be integrated into real operating rooms before
getting validated, this conference paper aims mainly at providing the community
with the basic principles of this novel concept through a first prototypal
partial realization based on MVOR dataset.
Related papers
- VISAGE: Video Synthesis using Action Graphs for Surgery [34.21344214645662]
We introduce the novel task of future video generation in laparoscopic surgery.
Our proposed method, VISAGE, leverages the power of action scene graphs to capture the sequential nature of laparoscopic procedures.
Results of our experiments demonstrate high-fidelity video generation for laparoscopy procedures.
arXiv Detail & Related papers (2024-10-23T10:28:17Z) - Autoregressive Sequence Modeling for 3D Medical Image Representation [48.706230961589924]
We introduce a pioneering method for learning 3D medical image representations through an autoregressive sequence pre-training framework.
Our approach various 3D medical images based on spatial, contrast, and semantic correlations, treating them as interconnected visual tokens within a token sequence.
arXiv Detail & Related papers (2024-09-13T10:19:10Z) - Creating a Digital Twin of Spinal Surgery: A Proof of Concept [68.37190859183663]
Surgery digitalization is the process of creating a virtual replica of real-world surgery.
We present a proof of concept (PoC) for surgery digitalization that is applied to an ex-vivo spinal surgery.
We employ five RGB-D cameras for dynamic 3D reconstruction of the surgeon, a high-end camera for 3D reconstruction of the anatomy, an infrared stereo camera for surgical instrument tracking, and a laser scanner for 3D reconstruction of the operating room and data fusion.
arXiv Detail & Related papers (2024-03-25T13:09:40Z) - Pixel-Wise Recognition for Holistic Surgical Scene Understanding [31.338288460529046]
This paper presents the Holistic and Multi-Granular Surgical Scene Understanding of Prostatectomies (GraSP) dataset.
GraSP is a curated benchmark that models surgical scene understanding as a hierarchy of complementary tasks with varying levels of granularity.
We introduce the Transformers for Actions, Phases, Steps, and Instrument (TAPIS) model, a general architecture that combines a global video feature extractor with localized region proposals.
arXiv Detail & Related papers (2024-01-20T09:09:52Z) - SeUNet-Trans: A Simple yet Effective UNet-Transformer Model for Medical
Image Segmentation [0.0]
We propose a simple yet effective UNet-Transformer (seUNet-Trans) model for medical image segmentation.
In our approach, the UNet model is designed as a feature extractor to generate multiple feature maps from the input images.
By leveraging the UNet architecture and the self-attention mechanism, our model not only retains the preservation of both local and global context information but also is capable of capturing long-range dependencies between input elements.
arXiv Detail & Related papers (2023-10-16T01:13:38Z) - Visual-Kinematics Graph Learning for Procedure-agnostic Instrument Tip
Segmentation in Robotic Surgeries [29.201385352740555]
We propose a novel visual-kinematics graph learning framework to accurately segment the instrument tip given various surgical procedures.
Specifically, a graph learning framework is proposed to encode relational features of instrument parts from both image and kinematics.
A cross-modal contrastive loss is designed to incorporate robust geometric prior from kinematics to image for tip segmentation.
arXiv Detail & Related papers (2023-09-02T14:52:58Z) - MedSegDiff-V2: Diffusion based Medical Image Segmentation with
Transformer [53.575573940055335]
We propose a novel Transformer-based Diffusion framework, called MedSegDiff-V2.
We verify its effectiveness on 20 medical image segmentation tasks with different image modalities.
arXiv Detail & Related papers (2023-01-19T03:42:36Z) - 4D-OR: Semantic Scene Graphs for OR Domain Modeling [72.1320671045942]
We propose using semantic scene graphs (SSG) to describe and summarize the surgical scene.
The nodes of the scene graphs represent different actors and objects in the room, such as medical staff, patients, and medical equipment.
We create the first publicly available 4D surgical SSG dataset, 4D-OR, containing ten simulated total knee replacement surgeries.
arXiv Detail & Related papers (2022-03-22T17:59:45Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - Spatiotemporal-Aware Augmented Reality: Redefining HCI in Image-Guided
Therapy [39.370739217840594]
Augmented reality (AR) has been introduced in the operating rooms in the last decade.
This paper shows how exemplary visualization are redefined by taking full advantage of head-mounted displays.
The awareness of the system from the geometric and physical characteristics of X-ray imaging allows the redefinition of different human-machine interfaces.
arXiv Detail & Related papers (2020-03-04T18:59:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.