Endora: Video Generation Models as Endoscopy Simulators
- URL: http://arxiv.org/abs/2403.11050v1
- Date: Sun, 17 Mar 2024 00:51:59 GMT
- Title: Endora: Video Generation Models as Endoscopy Simulators
- Authors: Chenxin Li, Hengyu Liu, Yifan Liu, Brandon Y. Feng, Wuyang Li, Xinyu Liu, Zhen Chen, Jing Shao, Yixuan Yuan,
- Abstract summary: This paper introduces model, an innovative approach to generate medical videos that simulate clinical endoscopy scenes.
We also pioneer the first public benchmark for endoscopy simulation with video generation models.
Endora marks a notable breakthrough in the deployment of generative AI for clinical endoscopy research.
- Score: 53.72175969751398
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generative models hold promise for revolutionizing medical education, robot-assisted surgery, and data augmentation for machine learning. Despite progress in generating 2D medical images, the complex domain of clinical video generation has largely remained untapped.This paper introduces \model, an innovative approach to generate medical videos that simulate clinical endoscopy scenes. We present a novel generative model design that integrates a meticulously crafted spatial-temporal video transformer with advanced 2D vision foundation model priors, explicitly modeling spatial-temporal dynamics during video generation. We also pioneer the first public benchmark for endoscopy simulation with video generation models, adapting existing state-of-the-art methods for this endeavor.Endora demonstrates exceptional visual quality in generating endoscopy videos, surpassing state-of-the-art methods in extensive testing. Moreover, we explore how this endoscopy simulator can empower downstream video analysis tasks and even generate 3D medical scenes with multi-view consistency. In a nutshell, Endora marks a notable breakthrough in the deployment of generative AI for clinical endoscopy research, setting a substantial stage for further advances in medical content generation. For more details, please visit our project page: https://endora-medvidgen.github.io/.
Related papers
- SurGen: Text-Guided Diffusion Model for Surgical Video Generation [0.6551407780976953]
SurGen is a text-guided diffusion model tailored for surgical video synthesis.
We validate the visual and temporal quality of the outputs using standard image and video generation metrics.
Our results demonstrate the potential of diffusion models to serve as valuable educational tools for surgical trainees.
arXiv Detail & Related papers (2024-08-26T05:38:27Z) - Bora: Biomedical Generalist Video Generation Model [20.572771714879856]
This paper introduces Bora, first model designed for text-guided biomedical video generation.
It is fine-tuned through model alignment and instruction tuning using a newly established medical video corpus.
Bora is capable of generating high-quality video data across four distinct biomedical domains.
arXiv Detail & Related papers (2024-07-12T03:00:25Z) - Interactive Generation of Laparoscopic Videos with Diffusion Models [1.5488613349551188]
We show how to generate realistic laparoscopic images and videos by specifying a surgical action through text.
We demonstrate the performance of our approach using the publicly available Cholec dataset family.
We achieve an FID of 38.097 and an F1-score of 0.71.
arXiv Detail & Related papers (2024-04-23T12:36:07Z) - MeshBrush: Painting the Anatomical Mesh with Neural Stylization for Endoscopy [0.8437187555622164]
Style transfer is a promising approach to close the sim-to-real gap in medical endoscopy.
rendering synthetic endoscopic videos by traversing pre-operative scans can generate structurally accurate simulations.
CycleGAN can imitate realistic endoscopic images from these simulations, but they are unsuitable for video-to-video synthesis.
We propose MeshBrush, a neural mesh stylization method to synthesize temporally consistent videos.
arXiv Detail & Related papers (2024-04-03T18:40:48Z) - Creating a Digital Twin of Spinal Surgery: A Proof of Concept [68.37190859183663]
Surgery digitalization is the process of creating a virtual replica of real-world surgery.
We present a proof of concept (PoC) for surgery digitalization that is applied to an ex-vivo spinal surgery.
We employ five RGB-D cameras for dynamic 3D reconstruction of the surgeon, a high-end camera for 3D reconstruction of the anatomy, an infrared stereo camera for surgical instrument tracking, and a laser scanner for 3D reconstruction of the operating room and data fusion.
arXiv Detail & Related papers (2024-03-25T13:09:40Z) - MeVGAN: GAN-based Plugin Model for Video Generation with Applications in
Colonoscopy [12.515404169717451]
We propose Memory Efficient Video GAN (MeVGAN) - a Geneversarative Adrial Network (GAN)
We use a pre-trained 2D-image GAN to construct respective trajectories in the noise space, so that the trajectory forwarded through the GAN model constructs a real-life video.
We show that MeVGAN can produce good quality synthetic colonoscopy videos, which can be potentially used in virtual simulators.
arXiv Detail & Related papers (2023-11-07T10:58:16Z) - BiomedJourney: Counterfactual Biomedical Image Generation by
Instruction-Learning from Multimodal Patient Journeys [99.7082441544384]
We present BiomedJourney, a novel method for counterfactual biomedical image generation by instruction-learning.
We use GPT-4 to process the corresponding imaging reports and generate a natural language description of disease progression.
The resulting triples are then used to train a latent diffusion model for counterfactual biomedical image generation.
arXiv Detail & Related papers (2023-10-16T18:59:31Z) - XrayGPT: Chest Radiographs Summarization using Medical Vision-Language
Models [60.437091462613544]
We introduce XrayGPT, a novel conversational medical vision-language model.
It can analyze and answer open-ended questions about chest radiographs.
We generate 217k interactive and high-quality summaries from free-text radiology reports.
arXiv Detail & Related papers (2023-06-13T17:59:59Z) - MedSegDiff-V2: Diffusion based Medical Image Segmentation with
Transformer [53.575573940055335]
We propose a novel Transformer-based Diffusion framework, called MedSegDiff-V2.
We verify its effectiveness on 20 medical image segmentation tasks with different image modalities.
arXiv Detail & Related papers (2023-01-19T03:42:36Z) - Multimodal Semantic Scene Graphs for Holistic Modeling of Surgical
Procedures [70.69948035469467]
We take advantage of the latest computer vision methodologies for generating 3D graphs from camera views.
We then introduce the Multimodal Semantic Graph Scene (MSSG) which aims at providing unified symbolic and semantic representation of surgical procedures.
arXiv Detail & Related papers (2021-06-09T14:35:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.