SurGrID: Controllable Surgical Simulation via Scene Graph to Image Diffusion
- URL: http://arxiv.org/abs/2502.07945v1
- Date: Tue, 11 Feb 2025 20:49:13 GMT
- Title: SurGrID: Controllable Surgical Simulation via Scene Graph to Image Diffusion
- Authors: Yannik Frisch, Ssharvien Kumar Sivakumar, Çağhan Köksal, Elsa Böhm, Felix Wagner, Adrian Gericke, Ghazal Ghazaei, Anirban Mukhopadhyay,
- Abstract summary: We introduce SurGrID, a Scene Graph to Image Diffusion Model, allowing for controllable surgical scene synthesis.
Scene Graphs encode a surgical scene's components' spatial and semantic information, which are then translated into an intermediate representation.
Our proposed method improves the fidelity of generated images and their coherence with the graph input over the state-of-the-art.
- Score: 0.8680185045005854
- License:
- Abstract: Surgical simulation offers a promising addition to conventional surgical training. However, available simulation tools lack photorealism and rely on hardcoded behaviour. Denoising Diffusion Models are a promising alternative for high-fidelity image synthesis, but existing state-of-the-art conditioning methods fall short in providing precise control or interactivity over the generated scenes. We introduce SurGrID, a Scene Graph to Image Diffusion Model, allowing for controllable surgical scene synthesis by leveraging Scene Graphs. These graphs encode a surgical scene's components' spatial and semantic information, which are then translated into an intermediate representation using our novel pre-training step that explicitly captures local and global information. Our proposed method improves the fidelity of generated images and their coherence with the graph input over the state-of-the-art. Further, we demonstrate the simulation's realism and controllability in a user assessment study involving clinical experts. Scene Graphs can be effectively used for precise and interactive conditioning of Denoising Diffusion Models for simulating surgical scenes, enabling high fidelity and interactive control over the generated content.
Related papers
- SimuScope: Realistic Endoscopic Synthetic Dataset Generation through Surgical Simulation and Diffusion Models [1.28795255913358]
We introduce a fully-fledged surgical simulator that automatically produces all necessary annotations for modern CAS systems.
It offers a more complex and realistic simulation of surgical interactions, including the dynamics between surgical instruments and deformable anatomical environments.
We propose a lightweight and flexible image-to-image translation method based on Stable Diffusion and Low-Rank Adaptation.
arXiv Detail & Related papers (2024-12-03T09:49:43Z) - EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion [77.0556470600979]
We present EchoScene, an interactive and controllable generative model that generates 3D indoor scenes on scene graphs.
Existing methods struggle to handle scene graphs due to varying numbers of nodes, multiple edge combinations, and manipulator-induced node-edge operations.
arXiv Detail & Related papers (2024-05-02T00:04:02Z) - Interactive Generation of Laparoscopic Videos with Diffusion Models [1.5488613349551188]
We show how to generate realistic laparoscopic images and videos by specifying a surgical action through text.
We demonstrate the performance of our approach using the publicly available Cholec dataset family.
We achieve an FID of 38.097 and an F1-score of 0.71.
arXiv Detail & Related papers (2024-04-23T12:36:07Z) - MeshBrush: Painting the Anatomical Mesh with Neural Stylization for Endoscopy [0.8437187555622164]
Style transfer is a promising approach to close the sim-to-real gap in medical endoscopy.
rendering synthetic endoscopic videos by traversing pre-operative scans can generate structurally accurate simulations.
CycleGAN can imitate realistic endoscopic images from these simulations, but they are unsuitable for video-to-video synthesis.
We propose MeshBrush, a neural mesh stylization method to synthesize temporally consistent videos.
arXiv Detail & Related papers (2024-04-03T18:40:48Z) - Controllable Mind Visual Diffusion Model [58.83896307930354]
Brain signal visualization has emerged as an active research area, serving as a critical interface between the human visual system and computer vision models.
We propose a novel approach, referred to as Controllable Mind Visual Model Diffusion (CMVDM)
CMVDM extracts semantic and silhouette information from fMRI data using attribute alignment and assistant networks.
We then leverage a control model to fully exploit the extracted information for image synthesis, resulting in generated images that closely resemble the visual stimuli in terms of semantics and silhouette.
arXiv Detail & Related papers (2023-05-17T11:36:40Z) - Person Image Synthesis via Denoising Diffusion Model [116.34633988927429]
We show how denoising diffusion models can be applied for high-fidelity person image synthesis.
Our results on two large-scale benchmarks and a user study demonstrate the photorealism of our proposed approach under challenging scenarios.
arXiv Detail & Related papers (2022-11-22T18:59:50Z) - Multimodal Semantic Scene Graphs for Holistic Modeling of Surgical
Procedures [70.69948035469467]
We take advantage of the latest computer vision methodologies for generating 3D graphs from camera views.
We then introduce the Multimodal Semantic Graph Scene (MSSG) which aims at providing unified symbolic and semantic representation of surgical procedures.
arXiv Detail & Related papers (2021-06-09T14:35:44Z) - Long-Term Temporally Consistent Unpaired Video Translation from
Simulated Surgical 3D Data [0.059110875077162096]
We propose a novel approach which combines unpaired image translation with neural rendering to transfer simulated to photorealistic surgical abdominal scenes.
By introducing global learnable textures and a lighting-invariant view-consistency loss, our method produces consistent translations of arbitrary views.
By extending existing image-based methods to view-consistent videos, we aim to impact the applicability of simulated training and evaluation environments for surgical applications.
arXiv Detail & Related papers (2021-03-31T16:31:26Z) - Learning Ultrasound Rendering from Cross-Sectional Model Slices for
Simulated Training [13.640630434743837]
Computational simulations can facilitate the training of such skills in virtual reality.
We propose herein to bypass any rendering and simulation process at interactive time.
We use a generative adversarial framework with a dedicated generator architecture and input feeding scheme.
arXiv Detail & Related papers (2021-01-20T21:58:19Z) - Towards Unsupervised Learning for Instrument Segmentation in Robotic
Surgery with Cycle-Consistent Adversarial Networks [54.00217496410142]
We propose an unpaired image-to-image translation where the goal is to learn the mapping between an input endoscopic image and a corresponding annotation.
Our approach allows to train image segmentation models without the need to acquire expensive annotations.
We test our proposed method on Endovis 2017 challenge dataset and show that it is competitive with supervised segmentation methods.
arXiv Detail & Related papers (2020-07-09T01:39:39Z) - Intrinsic Autoencoders for Joint Neural Rendering and Intrinsic Image
Decomposition [67.9464567157846]
We propose an autoencoder for joint generation of realistic images from synthetic 3D models while simultaneously decomposing real images into their intrinsic shape and appearance properties.
Our experiments confirm that a joint treatment of rendering and decomposition is indeed beneficial and that our approach outperforms state-of-the-art image-to-image translation baselines both qualitatively and quantitatively.
arXiv Detail & Related papers (2020-06-29T12:53:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.