4D-OR: Semantic Scene Graphs for OR Domain Modeling
- URL: http://arxiv.org/abs/2203.11937v1
- Date: Tue, 22 Mar 2022 17:59:45 GMT
- Title: 4D-OR: Semantic Scene Graphs for OR Domain Modeling
- Authors: Ege \"Ozsoy, Evin P{\i}nar \"Ornek, Ulrich Eck, Tobias Czempiel,
Federico Tombari, Nassir Navab
- Abstract summary: We propose using semantic scene graphs (SSG) to describe and summarize the surgical scene.
The nodes of the scene graphs represent different actors and objects in the room, such as medical staff, patients, and medical equipment.
We create the first publicly available 4D surgical SSG dataset, 4D-OR, containing ten simulated total knee replacement surgeries.
- Score: 72.1320671045942
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Surgical procedures are conducted in highly complex operating rooms (OR),
comprising different actors, devices, and interactions. To date, only medically
trained human experts are capable of understanding all the links and
interactions in such a demanding environment. This paper aims to bring the
community one step closer to automated, holistic and semantic understanding and
modeling of OR domain. Towards this goal, for the first time, we propose using
semantic scene graphs (SSG) to describe and summarize the surgical scene. The
nodes of the scene graphs represent different actors and objects in the room,
such as medical staff, patients, and medical equipment, whereas edges are the
relationships between them. To validate the possibilities of the proposed
representation, we create the first publicly available 4D surgical SSG dataset,
4D-OR, containing ten simulated total knee replacement surgeries recorded with
six RGB-D sensors in a realistic OR simulation center. 4D-OR includes 6734
frames and is richly annotated with SSGs, human and object poses, and clinical
roles. We propose an end-to-end neural network-based SSG generation pipeline,
with a rate of success of 0.75 macro F1, indeed being able to infer semantic
reasoning in the OR. We further demonstrate the representation power of our
scene graphs by using it for the problem of clinical role prediction, where we
achieve 0.85 macro F1. The code and dataset will be made available upon
acceptance.
Related papers
- Handling Geometric Domain Shifts in Semantic Segmentation of Surgical RGB and Hyperspectral Images [67.66644395272075]
We present first analysis of state-of-the-art semantic segmentation models when faced with geometric out-of-distribution data.
We propose an augmentation technique called "Organ Transplantation" to enhance generalizability.
Our augmentation technique improves SOA model performance by up to 67 % for RGB data and 90 % for HSI data, achieving performance at the level of in-distribution performance on real OOD test data.
arXiv Detail & Related papers (2024-08-27T19:13:15Z) - G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis [57.07638884476174]
G-HOP is a denoising diffusion based generative prior for hand-object interactions.
We represent the human hand via a skeletal distance field to obtain a representation aligned with the signed distance field for the object.
We show that this hand-object prior can then serve as generic guidance to facilitate other tasks like reconstruction from interaction clip and human grasp synthesis.
arXiv Detail & Related papers (2024-04-18T17:59:28Z) - LABRAD-OR: Lightweight Memory Scene Graphs for Accurate Bimodal
Reasoning in Dynamic Operating Rooms [39.11134330259464]
holistic modeling of the operating room (OR) is a challenging but essential task.
We introduce memory scene graphs, where the scene graphs of previous time steps act as the temporal representation guiding the current prediction.
We design an end-to-end architecture that intelligently fuses the temporal information of our lightweight memory scene graphs with the visual information from point clouds and images.
arXiv Detail & Related papers (2023-03-23T14:26:16Z) - Semantic segmentation of surgical hyperspectral images under geometric
domain shifts [69.91792194237212]
We present the first analysis of state-of-the-art semantic segmentation networks in the presence of geometric out-of-distribution (OOD) data.
We also address generalizability with a dedicated augmentation technique termed "Organ Transplantation"
Our scheme improves on the SOA DSC by up to 67 % (RGB) and 90 % (HSI) and renders performance on par with in-distribution performance on real OOD test data.
arXiv Detail & Related papers (2023-03-20T09:50:07Z) - Multimodal Semantic Scene Graphs for Holistic Modeling of Surgical
Procedures [70.69948035469467]
We take advantage of the latest computer vision methodologies for generating 3D graphs from camera views.
We then introduce the Multimodal Semantic Graph Scene (MSSG) which aims at providing unified symbolic and semantic representation of surgical procedures.
arXiv Detail & Related papers (2021-06-09T14:35:44Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z) - Learning and Reasoning with the Graph Structure Representation in
Robotic Surgery [15.490603884631764]
Learning to infer graph representations can play a vital role in surgical scene understanding in robotic surgery.
We develop an approach to generate the scene graph and predict surgical interactions between instruments and surgical region of interest.
arXiv Detail & Related papers (2020-07-07T11:49:34Z) - 3D Dynamic Scene Graphs: Actionable Spatial Perception with Places,
Objects, and Humans [27.747241700017728]
We present a unified representation for actionable spatial perception: 3D Dynamic Scene Graphs.
3D Dynamic Scene Graphs can have a profound impact on planning and decision-making, human-robot interaction, long-term autonomy, and scene prediction.
arXiv Detail & Related papers (2020-02-15T00:46:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.