Interactive Generation of Laparoscopic Videos with Diffusion Models
- URL: http://arxiv.org/abs/2406.06537v1
- Date: Tue, 23 Apr 2024 12:36:07 GMT
- Title: Interactive Generation of Laparoscopic Videos with Diffusion Models
- Authors: Ivan Iliash, Simeon Allmendinger, Felix Meissen, Niklas Kühl, Daniel Rückert,
- Abstract summary: We show how to generate realistic laparoscopic images and videos by specifying a surgical action through text.
We demonstrate the performance of our approach using the publicly available Cholec dataset family.
We achieve an FID of 38.097 and an F1-score of 0.71.
- Score: 1.5488613349551188
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Generative AI, in general, and synthetic visual data generation, in specific, hold much promise for benefiting surgical training by providing photorealism to simulation environments. Current training methods primarily rely on reading materials and observing live surgeries, which can be time-consuming and impractical. In this work, we take a significant step towards improving the training process. Specifically, we use diffusion models in combination with a zero-shot video diffusion method to interactively generate realistic laparoscopic images and videos by specifying a surgical action through text and guiding the generation with tool positions through segmentation masks. We demonstrate the performance of our approach using the publicly available Cholec dataset family and evaluate the fidelity and factual correctness of our generated images using a surgical action recognition model as well as the pixel-wise F1-score for the spatial control of tool generation. We achieve an FID of 38.097 and an F1-score of 0.71.
Related papers
- Efficient Data-driven Scene Simulation using Robotic Surgery Videos via Physics-embedded 3D Gaussians [19.590481146949685]
We introduce 3D Gaussian as a learnable representation for surgical scene, which is learned from stereo endoscopic video.
We apply the Material Point Method, which is integrated with physical properties, to the 3D Gaussians to achieve realistic scene deformations.
Results show that it can reconstruct and simulate surgical scenes from endoscopic videos efficiently-taking only a few minutes to reconstruct the surgical scene.
arXiv Detail & Related papers (2024-05-02T02:34:19Z) - Efficient Surgical Tool Recognition via HMM-Stabilized Deep Learning [25.146476653453227]
We propose an HMM-stabilized deep learning method for tool presence detection.
A range of experiments confirm that the proposed approaches achieve better performance with lower training and running costs.
These results suggest that popular deep learning approaches with over-complicated model structures may suffer from inefficient utilization of data.
arXiv Detail & Related papers (2024-04-07T15:27:35Z) - MeshBrush: Painting the Anatomical Mesh with Neural Stylization for Endoscopy [0.8437187555622164]
Style transfer is a promising approach to close the sim-to-real gap in medical endoscopy.
Rendering realistic endoscopic videos by traversing pre-operative scans can generate realistic simulations as well as ground truth camera poses and depth maps.
We propose MeshBrush, a neural mesh stylization method to synthesize temporally consistent videos with differentiable rendering.
arXiv Detail & Related papers (2024-04-03T18:40:48Z) - Endora: Video Generation Models as Endoscopy Simulators [53.72175969751398]
This paper introduces model, an innovative approach to generate medical videos that simulate clinical endoscopy scenes.
We also pioneer the first public benchmark for endoscopy simulation with video generation models.
Endora marks a notable breakthrough in the deployment of generative AI for clinical endoscopy research.
arXiv Detail & Related papers (2024-03-17T00:51:59Z) - Navigating the Synthetic Realm: Harnessing Diffusion-based Models for
Laparoscopic Text-to-Image Generation [3.2039076408339353]
We present an intuitive approach for generating synthetic laparoscopic images from short text prompts using diffusion-based generative models.
Results on fidelity and diversity demonstrate that diffusion-based models can acquire knowledge about the style and semantics in the field of image-guided surgery.
arXiv Detail & Related papers (2023-12-05T16:20:22Z) - Domain Adaptive Sim-to-Real Segmentation of Oropharyngeal Organs Towards
Robot-assisted Intubation [15.795665057836636]
This work introduces a virtual dataset generated by the Open Framework Architecture framework to overcome the limited availability of actual endoscopic images.
We also propose a domain adaptive Sim-to-Real method for oropharyngeal organ image segmentation, which employs an image blending strategy.
Experimental results demonstrate the superior performance of the proposed approach with domain adaptive models.
arXiv Detail & Related papers (2023-05-19T14:08:15Z) - Pseudo-label Guided Cross-video Pixel Contrast for Robotic Surgical
Scene Segmentation with Limited Annotations [72.15956198507281]
We propose PGV-CL, a novel pseudo-label guided cross-video contrast learning method to boost scene segmentation.
We extensively evaluate our method on a public robotic surgery dataset EndoVis18 and a public cataract dataset CaDIS.
arXiv Detail & Related papers (2022-07-20T05:42:19Z) - Multimodal Semantic Scene Graphs for Holistic Modeling of Surgical
Procedures [70.69948035469467]
We take advantage of the latest computer vision methodologies for generating 3D graphs from camera views.
We then introduce the Multimodal Semantic Graph Scene (MSSG) which aims at providing unified symbolic and semantic representation of surgical procedures.
arXiv Detail & Related papers (2021-06-09T14:35:44Z) - One to Many: Adaptive Instrument Segmentation via Meta Learning and
Dynamic Online Adaptation in Robotic Surgical Video [71.43912903508765]
MDAL is a dynamic online adaptive learning scheme for instrument segmentation in robot-assisted surgery.
It learns the general knowledge of instruments and the fast adaptation ability through the video-specific meta-learning paradigm.
It outperforms other state-of-the-art methods on two datasets.
arXiv Detail & Related papers (2021-03-24T05:02:18Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - Towards Unsupervised Learning for Instrument Segmentation in Robotic
Surgery with Cycle-Consistent Adversarial Networks [54.00217496410142]
We propose an unpaired image-to-image translation where the goal is to learn the mapping between an input endoscopic image and a corresponding annotation.
Our approach allows to train image segmentation models without the need to acquire expensive annotations.
We test our proposed method on Endovis 2017 challenge dataset and show that it is competitive with supervised segmentation methods.
arXiv Detail & Related papers (2020-07-09T01:39:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.