Related papers: Interactive Generation of Laparoscopic Videos with Diffusion Models

Interactive Generation of Laparoscopic Videos with Diffusion Models

URL: http://arxiv.org/abs/2406.06537v1
Date: Tue, 23 Apr 2024 12:36:07 GMT
Title: Interactive Generation of Laparoscopic Videos with Diffusion Models
Authors: Ivan Iliash, Simeon Allmendinger, Felix Meissen, Niklas Kühl, Daniel Rückert,
Abstract summary: We show how to generate realistic laparoscopic images and videos by specifying a surgical action through text. We demonstrate the performance of our approach using the publicly available Cholec dataset family. We achieve an FID of 38.097 and an F1-score of 0.71.
Score: 1.5488613349551188
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Generative AI, in general, and synthetic visual data generation, in specific, hold much promise for benefiting surgical training by providing photorealism to simulation environments. Current training methods primarily rely on reading materials and observing live surgeries, which can be time-consuming and impractical. In this work, we take a significant step towards improving the training process. Specifically, we use diffusion models in combination with a zero-shot video diffusion method to interactively generate realistic laparoscopic images and videos by specifying a surgical action through text and guiding the generation with tool positions through segmentation masks. We demonstrate the performance of our approach using the publicly available Cholec dataset family and evaluate the fidelity and factual correctness of our generated images using a surgical action recognition model as well as the pixel-wise F1-score for the spatial control of tool generation. We achieve an FID of 38.097 and an F1-score of 0.71.

Related papers

EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance [79.66329903007869]
We present EchoWorld, a motion-aware world modeling framework for probe guidance. It encodes anatomical knowledge and motion-induced visual dynamics. It is trained on more than one million ultrasound images from over 200 routine scans.
arXiv Detail & Related papers (2025-04-17T16:19:05Z)
VISAGE: Video Synthesis using Action Graphs for Surgery [34.21344214645662]
We introduce the novel task of future video generation in laparoscopic surgery. Our proposed method, VISAGE, leverages the power of action scene graphs to capture the sequential nature of laparoscopic procedures. Results of our experiments demonstrate high-fidelity video generation for laparoscopy procedures.
arXiv Detail & Related papers (2024-10-23T10:28:17Z)
SurGen: Text-Guided Diffusion Model for Surgical Video Generation [0.6551407780976953]
SurGen is a text-guided diffusion model tailored for surgical video synthesis. We validate the visual and temporal quality of the outputs using standard image and video generation metrics. Our results demonstrate the potential of diffusion models to serve as valuable educational tools for surgical trainees.
arXiv Detail & Related papers (2024-08-26T05:38:27Z)
SurgicaL-CD: Generating Surgical Images via Unpaired Image Translation with Latent Consistency Diffusion Models [1.6189876649941652]
We introduce emphSurgicaL-CD, a consistency-distilled diffusion method to generate realistic surgical images. Our results demonstrate that our method outperforms GANs and diffusion-based approaches.
arXiv Detail & Related papers (2024-08-19T09:19:25Z)
Efficient Surgical Tool Recognition via HMM-Stabilized Deep Learning [25.146476653453227]
We propose an HMM-stabilized deep learning method for tool presence detection. A range of experiments confirm that the proposed approaches achieve better performance with lower training and running costs. These results suggest that popular deep learning approaches with over-complicated model structures may suffer from inefficient utilization of data.
arXiv Detail & Related papers (2024-04-07T15:27:35Z)
Endora: Video Generation Models as Endoscopy Simulators [53.72175969751398]
This paper introduces model, an innovative approach to generate medical videos that simulate clinical endoscopy scenes. We also pioneer the first public benchmark for endoscopy simulation with video generation models. Endora marks a notable breakthrough in the deployment of generative AI for clinical endoscopy research.
arXiv Detail & Related papers (2024-03-17T00:51:59Z)
Navigating the Synthetic Realm: Harnessing Diffusion-based Models for Laparoscopic Text-to-Image Generation [3.2039076408339353]
We present an intuitive approach for generating synthetic laparoscopic images from short text prompts using diffusion-based generative models. Results on fidelity and diversity demonstrate that diffusion-based models can acquire knowledge about the style and semantics in the field of image-guided surgery.
arXiv Detail & Related papers (2023-12-05T16:20:22Z)
Pseudo-label Guided Cross-video Pixel Contrast for Robotic Surgical Scene Segmentation with Limited Annotations [72.15956198507281]
We propose PGV-CL, a novel pseudo-label guided cross-video contrast learning method to boost scene segmentation. We extensively evaluate our method on a public robotic surgery dataset EndoVis18 and a public cataract dataset CaDIS.
arXiv Detail & Related papers (2022-07-20T05:42:19Z)
Multimodal Semantic Scene Graphs for Holistic Modeling of Surgical Procedures [70.69948035469467]
We take advantage of the latest computer vision methodologies for generating 3D graphs from camera views. We then introduce the Multimodal Semantic Graph Scene (MSSG) which aims at providing unified symbolic and semantic representation of surgical procedures.
arXiv Detail & Related papers (2021-06-09T14:35:44Z)
One to Many: Adaptive Instrument Segmentation via Meta Learning and Dynamic Online Adaptation in Robotic Surgical Video [71.43912903508765]
MDAL is a dynamic online adaptive learning scheme for instrument segmentation in robot-assisted surgery. It learns the general knowledge of instruments and the fast adaptation ability through the video-specific meta-learning paradigm. It outperforms other state-of-the-art methods on two datasets.
arXiv Detail & Related papers (2021-03-24T05:02:18Z)
Relational Graph Learning on Visual and Kinematics Embeddings for Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information. The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z)
Towards Unsupervised Learning for Instrument Segmentation in Robotic Surgery with Cycle-Consistent Adversarial Networks [54.00217496410142]
We propose an unpaired image-to-image translation where the goal is to learn the mapping between an input endoscopic image and a corresponding annotation. Our approach allows to train image segmentation models without the need to acquire expensive annotations. We test our proposed method on Endovis 2017 challenge dataset and show that it is competitive with supervised segmentation methods.
arXiv Detail & Related papers (2020-07-09T01:39:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.