Related papers: Artificial Intelligence for Biomedical Video Generation

Artificial Intelligence for Biomedical Video Generation

URL: http://arxiv.org/abs/2411.07619v1
Date: Tue, 12 Nov 2024 08:05:58 GMT
Title: Artificial Intelligence for Biomedical Video Generation
Authors: Linyuan Li, Jianing Qiu, Anujit Saha, Lin Li, Poyuan Li, Mengxian He, Ziyu Guo, Wu Yuan,
Abstract summary: Introduction of Sora-alike models represents a pivotal breakthrough in video generation technologies. Video generation technology has shown immense potential such as medical concept explanation, disease simulation, and biomedical data augmentation.
Score: 8.21248952391087
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: As a prominent subfield of Artificial Intelligence Generated Content (AIGC), video generation has achieved notable advancements in recent years. The introduction of Sora-alike models represents a pivotal breakthrough in video generation technologies, significantly enhancing the quality of synthesized videos. Particularly in the realm of biomedicine, video generation technology has shown immense potential such as medical concept explanation, disease simulation, and biomedical data augmentation. In this article, we thoroughly examine the latest developments in video generation models and explore their applications, challenges, and future opportunities in the biomedical sector. We have conducted an extensive review and compiled a comprehensive list of datasets from various sources to facilitate the development and evaluation of video generative models in biomedicine. Given the rapid progress in this field, we have also created a github repository to regularly update the advances of biomedical video generation at: https://github.com/Lee728243228/Biomedical-Video-Generation

Related papers

Flow Matching Meets Biology and Life Science: A Survey [65.2146737141455]
Flow matching has emerged as a powerful and efficient alternative to diffusion-based generative modeling.<n>This paper presents the first comprehensive survey of recent developments in flow matching and its applications in biological domains.
arXiv Detail & Related papers (2025-07-23T17:44:29Z)
Controllable Video Generation: A Survey [72.38313362192784]
We provide a systematic review of controllable video generation, covering both theoretical foundations and recent advances in the field.<n>We begin by introducing the key concepts and commonly used open-source video generation models.<n>We then focus on control mechanisms in video diffusion models, analyzing how different types of conditions can be incorporated into the denoising process to guide generation.
arXiv Detail & Related papers (2025-07-22T06:05:34Z)
MedGen: Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos [16.86256309424395]
We introduce MedVideoCap-55K, the first large-scale, diverse, and caption-rich dataset for medical video generation.<n>It comprises over 55,000 curated clips spanning real-world medical scenarios.<n>Built upon this dataset, we develop MedGen, which achieves leading performance among open-source models.
arXiv Detail & Related papers (2025-07-08T04:58:36Z)
GenWorld: Towards Detecting AI-generated Real-world Simulation Videos [79.98542193919957]
GenWorld is a large-scale, high-quality, and real-world simulation dataset for AI-generated video detection.<n>We propose a model, SpannDetector, to leverage multi-view consistency as a strong criterion for real-world AI-generated video detection.
arXiv Detail & Related papers (2025-06-12T17:59:33Z)
A Large-Scale Vision-Language Dataset Derived from Open Scientific Literature to Advance Biomedical Generalist AI [70.06771291117965]
We introduce Biomedica, an open-source dataset derived from the PubMed Central Open Access subset. Biomedica contains over 6 million scientific articles and 24 million image-text pairs. We provide scalable streaming and search APIs through a web server, facilitating seamless integration with AI systems.
arXiv Detail & Related papers (2025-03-26T05:56:46Z)
ASurvey: Spatiotemporal Consistency in Video Generation [72.82267240482874]
Video generation schemes by leveraging a dynamic visual generation method, pushes the boundaries of Artificial Intelligence Generated Content (AIGC) Recent works have aimed at addressing thetemporal consistency issue in video generation, while few literature review has been organized from this perspective. We systematically review recent advances in video generation, covering five key aspects: foundation models, information representations, generation schemes, post-processing techniques, and evaluation metrics.
arXiv Detail & Related papers (2025-02-25T05:20:51Z)
MedMax: Mixed-Modal Instruction Tuning for Training Biomedical Assistants [28.04215981636089]
We present MedMax, a large-scale multimodal biomedical instruction-tuning dataset for mixed-modal foundation models. With 1.47 million instances, MedMax encompasses a diverse range of tasks, including interleaved imagetext generation, biomedical image captioning and generation, visual chat, and report understanding. We fine-tune a mixed-modal foundation model on the MedMax dataset, achieving significant performance improvements.
arXiv Detail & Related papers (2024-12-17T08:30:00Z)
SurGen: Text-Guided Diffusion Model for Surgical Video Generation [0.6551407780976953]
SurGen is a text-guided diffusion model tailored for surgical video synthesis. We validate the visual and temporal quality of the outputs using standard image and video generation metrics. Our results demonstrate the potential of diffusion models to serve as valuable educational tools for surgical trainees.
arXiv Detail & Related papers (2024-08-26T05:38:27Z)
Bora: Biomedical Generalist Video Generation Model [20.572771714879856]
This paper introduces Bora, first model designed for text-guided biomedical video generation. It is fine-tuned through model alignment and instruction tuning using a newly established medical video corpus. Bora is capable of generating high-quality video data across four distinct biomedical domains.
arXiv Detail & Related papers (2024-07-12T03:00:25Z)
Synthetic data: How could it be used for infectious disease research? [0.16752458252726457]
Concerns have been raised about potential negative factors associated with the possibilities of artificial dataset generation. These include the potential misuse of generative artificial intelligence in fields such as cybercrime. Synthetic data offers significant benefits, particularly in data privacy, research, in balancing datasets and reducing bias in machine learning models.
arXiv Detail & Related papers (2024-07-03T17:13:04Z)
VideoPhy: Evaluating Physical Commonsense for Video Generation [93.28748850301949]
We present VideoPhy, a benchmark designed to assess whether the generated videos follow physical commonsense for real-world activities. We then generate videos conditioned on captions from diverse state-of-the-art text-to-video generative models. Our human evaluation reveals that the existing models severely lack the ability to generate videos adhering to the given text prompts.
arXiv Detail & Related papers (2024-06-05T17:53:55Z)
Endora: Video Generation Models as Endoscopy Simulators [53.72175969751398]
This paper introduces model, an innovative approach to generate medical videos that simulate clinical endoscopy scenes. We also pioneer the first public benchmark for endoscopy simulation with video generation models. Endora marks a notable breakthrough in the deployment of generative AI for clinical endoscopy research.
arXiv Detail & Related papers (2024-03-17T00:51:59Z)
Sora as an AGI World Model? A Complete Survey on Text-to-Video Generation [30.245348014602577]
We discuss the evolution of video generation from text, starting with animating MNIST numbers to simulating the physical world with Sora. Our review into the shortcomings of Sora-generated videos pinpoints the call for more in-depth studies in various enabling aspects of video generation. We conclude that the study of the text-to-video generation may still be in its infancy, requiring contribution from the cross-discipline research community.
arXiv Detail & Related papers (2024-03-08T07:58:13Z)
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models [59.54172719450617]
Sora is a text-to-video generative AI model, released by OpenAI in February 2024. This paper presents a review of the model's background, related technologies, applications, remaining challenges, and future directions.
arXiv Detail & Related papers (2024-02-27T03:30:58Z)
Video as the New Language for Real-World Decision Making [100.68643056416394]
Video data captures important information about the physical world that is difficult to express in language. Video can serve as a unified interface that can absorb internet knowledge and represent diverse tasks. We identify major impact opportunities in domains such as robotics, self-driving, and science.
arXiv Detail & Related papers (2024-02-27T02:05:29Z)
State of the Art on Diffusion Models for Visual Computing [191.6168813012954]
This report introduces the basic mathematical concepts of diffusion models, implementation details and design choices of the popular Stable Diffusion model. We also give a comprehensive overview of the rapidly growing literature on diffusion-based generation and editing. We discuss available datasets, metrics, open challenges, and social implications.
arXiv Detail & Related papers (2023-10-11T05:32:29Z)
Video Generation from Text Employing Latent Path Construction for Temporal Modeling [70.06508219998778]
Video generation is one of the most challenging tasks in Machine Learning and Computer Vision fields of study. In this paper, we tackle the text to video generation problem, which is a conditional form of video generation. We believe that video generation from natural language sentences will have an important impact on Artificial Intelligence.
arXiv Detail & Related papers (2021-07-29T06:28:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.