Related papers: MedPix 2.0: A Comprehensive Multimodal Biomedical Dataset for Advanced AI Applications

MedPix 2.0: A Comprehensive Multimodal Biomedical Dataset for Advanced AI Applications

URL: http://arxiv.org/abs/2407.02994v1
Date: Wed, 3 Jul 2024 10:49:21 GMT
Title: MedPix 2.0: A Comprehensive Multimodal Biomedical Dataset for Advanced AI Applications
Authors: Irene Siragusa, Salvatore Contino, Massimo La Ciura, Rosario Alicata, Roberto Pirrone,
Abstract summary: This paper illustrates the entire workflow for building the data set MedPix 2.0. Along with the dataset, we developed a GUI aimed at navigating efficiently the MongoDB instance. We also propose a CLIP-based model trained on MedPix 2.0 for scan classification tasks.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The increasing interest in developing Artificial Intelligence applications in the medical domain, suffers from the lack of high-quality dataset, mainly due to privacy-related issues. Moreover, the recent rising of Multimodal Large Language Models (MLLM) leads to a need for multimodal medical datasets, where clinical reports and findings are attached to the corresponding CT or MR scans. This paper illustrates the entire workflow for building the data set MedPix 2.0. Starting from the well-known multimodal dataset MedPix\textsuperscript{\textregistered}, mainly used by physicians, nurses and healthcare students for Continuing Medical Education purposes, a semi-automatic pipeline was developed to extract visual and textual data followed by a manual curing procedure where noisy samples were removed, thus creating a MongoDB database. Along with the dataset, we developed a GUI aimed at navigating efficiently the MongoDB instance, and obtaining the raw data that can be easily used for training and/or fine-tuning MLLMs. To enforce this point, we also propose a CLIP-based model trained on MedPix 2.0 for scan classification tasks.

Related papers

MedCoDi-M: A Multi-Prompt Foundation Model for Multimodal Medical Data Generation [22.908801443059758]
We present MedCoDi-M, a model for multimodal medical data generation. We benchmark it against five competitors on the MIMIC-CXR dataset. We assess the utility of MedCoDi-M in addressing key challenges in the medical field.
arXiv Detail & Related papers (2025-01-08T16:53:56Z)
MedMax: Mixed-Modal Instruction Tuning for Training Biomedical Assistants [28.04215981636089]
We present MedMax, a large-scale multimodal biomedical instruction-tuning dataset for mixed-modal foundation models. With 1.47 million instances, MedMax encompasses a diverse range of tasks, including interleaved imagetext generation, biomedical image captioning and generation, visual chat, and report understanding. We fine-tune a mixed-modal foundation model on the MedMax dataset, achieving significant performance improvements.
arXiv Detail & Related papers (2024-12-17T08:30:00Z)
UniMed-CLIP: Towards a Unified Image-Text Pretraining Paradigm for Diverse Medical Imaging Modalities [68.12889379702824]
Vision-Language Models (VLMs) trained via contrastive learning have achieved notable success in natural image tasks. UniMed is a large-scale, open-source multi-modal medical dataset comprising over 5.3 million image-text pairs. We trained UniMed-CLIP, a unified VLM for six modalities, achieving notable gains in zero-shot evaluations.
arXiv Detail & Related papers (2024-12-13T18:59:40Z)
LoGra-Med: Long Context Multi-Graph Alignment for Medical Vision-Language Model [55.80651780294357]
State-of-the-art medical multi-modal large language models (med-MLLM) leverage instruction-following data in pre-training. LoGra-Med is a new multi-graph alignment algorithm that enforces triplet correlations across image modalities, conversation-based descriptions, and extended captions. Our results show LoGra-Med matches LLAVA-Med performance on 600K image-text pairs for Medical VQA and significantly outperforms it when trained on 10% of the data.
arXiv Detail & Related papers (2024-10-03T15:52:03Z)
MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine [53.01393667775077]
This paper introduces MedTrinity-25M, a comprehensive, large-scale multimodal dataset for medicine. It covers over 25 million images across 10 modalities, with multigranular annotations for more than 65 diseases. Unlike existing approach which is limited by the availability of image-text pairs, we have developed the first automated pipeline.
arXiv Detail & Related papers (2024-08-06T02:09:35Z)
HoneyBee: A Scalable Modular Framework for Creating Multimodal Oncology Datasets with Foundational Embedding Models [16.567468717846676]
HoneyBee is a scalable modular framework for building multimodal oncology datasets. It generates embeddings that capture the essential features and relationships within the raw medical data. HoneyBee is an ongoing open-source effort, and the code, datasets, and models are available at the project repository.
arXiv Detail & Related papers (2024-05-13T04:35:14Z)
Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed. In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset. We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z)
Towards Generalist Foundation Model for Radiology by Leveraging Web-scale 2D&3D Medical Data [66.9359934608229]
This study aims to initiate the development of Radiology Foundation Model, termed as RadFM. To the best of our knowledge, this is the first large-scale, high-quality, medical visual-language dataset, with both 2D and 3D scans. We propose a new evaluation benchmark, RadBench, that comprises five tasks, including modality recognition, disease diagnosis, visual question answering, report generation and rationale diagnosis.
arXiv Detail & Related papers (2023-08-04T17:00:38Z)
Med-Flamingo: a Multimodal Medical Few-shot Learner [58.85676013818811]
We propose Med-Flamingo, a multimodal few-shot learner adapted to the medical domain. Based on OpenFlamingo-9B, we continue pre-training on paired and interleaved medical image-text data from publications and textbooks. We conduct the first human evaluation for generative medical VQA where physicians review the problems and blinded generations in an interactive app.
arXiv Detail & Related papers (2023-07-27T20:36:02Z)
DeepMediX: A Deep Learning-Driven Resource-Efficient Medical Diagnosis Across the Spectrum [15.382184404673389]
This work presents textttDeepMediX, a groundbreaking, resource-efficient model that significantly addresses this challenge. Built on top of the MobileNetV2 architecture, DeepMediX excels in classifying brain MRI scans and skin cancer images. DeepMediX's design also includes the concept of Federated Learning, enabling a collaborative learning approach without compromising data privacy.
arXiv Detail & Related papers (2023-07-01T12:30:58Z)
medigan: A Python Library of Pretrained Generative Models for Enriched Data Access in Medical Imaging [3.8568465270960264]
medigan is a one-stop shop for pretrained generative models implemented as an open-source framework-agnostic Python library. It allows researchers and developers to create, increase, and domain-adapt their training data in just a few lines of code. The library's scalability and design is demonstrated by its growing number of integrated and readily-usable pretrained generative models.
arXiv Detail & Related papers (2022-09-28T23:45:33Z)
Understanding the Tricks of Deep Learning in Medical Image Segmentation: Challenges and Future Directions [66.40971096248946]
In this paper, we collect a series of MedISeg tricks for different model implementation phases. We experimentally explore the effectiveness of these tricks on consistent baselines. We also open-sourced a strong MedISeg repository, where each component has the advantage of plug-and-play.
arXiv Detail & Related papers (2022-09-21T12:30:05Z)
MedMNIST v2: A Large-Scale Lightweight Benchmark for 2D and 3D Biomedical Image Classification [59.10015984688104]
MedMNIST v2 is a large-scale MNIST-like dataset collection of standardized biomedical images. The resulting dataset consists of 708,069 2D images and 10,214 3D images in total.
arXiv Detail & Related papers (2021-10-27T22:02:04Z)
MedMNIST Classification Decathlon: A Lightweight AutoML Benchmark for Medical Image Analysis [46.02653153307692]
We present MedMNIST, a collection of 10 pre-processed medical open datasets. MedMNIST is standardized to perform classification tasks on lightweight 28x28 images. MedMNIST could be used for educational purpose, rapid prototyping, multi-modal machine learning or AutoML in medical image analysis.
arXiv Detail & Related papers (2020-10-28T12:41:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.