UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation
- URL: http://arxiv.org/abs/2504.21336v1
- Date: Wed, 30 Apr 2025 05:51:48 GMT
- Title: UniBiomed: A Universal Foundation Model for Grounded Biomedical Image Interpretation
- Authors: Linshan Wu, Yuxiang Nie, Sunan He, Jiaxin Zhuang, Hao Chen,
- Abstract summary: We introduce UniBiomed, the first universal foundation model for grounded biomedical image interpretation.<n>UniBiomed is based on a novel integration of Multi-modal Large Language Model (MLLM) and Segment Anything Model (SAM)<n>To develop UniBiomed, we curate a large-scale dataset comprising over 27 million of images, annotations, and text descriptions across ten imaging modalities.
- Score: 8.781512619275208
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-modal interpretation of biomedical images opens up novel opportunities in biomedical image analysis. Conventional AI approaches typically rely on disjointed training, i.e., Large Language Models (LLMs) for clinical text generation and segmentation models for target extraction, which results in inflexible real-world deployment and a failure to leverage holistic biomedical information. To this end, we introduce UniBiomed, the first universal foundation model for grounded biomedical image interpretation. UniBiomed is based on a novel integration of Multi-modal Large Language Model (MLLM) and Segment Anything Model (SAM), which effectively unifies the generation of clinical texts and the segmentation of corresponding biomedical objects for grounded interpretation. In this way, UniBiomed is capable of tackling a wide range of biomedical tasks across ten diverse biomedical imaging modalities. To develop UniBiomed, we curate a large-scale dataset comprising over 27 million triplets of images, annotations, and text descriptions across ten imaging modalities. Extensive validation on 84 internal and external datasets demonstrated that UniBiomed achieves state-of-the-art performance in segmentation, disease recognition, region-aware diagnosis, visual question answering, and report generation. Moreover, unlike previous models that rely on clinical experts to pre-diagnose images and manually craft precise textual or visual prompts, UniBiomed can provide automated and end-to-end grounded interpretation for biomedical image analysis. This represents a novel paradigm shift in clinical workflows, which will significantly improve diagnostic efficiency. In summary, UniBiomed represents a novel breakthrough in biomedical AI, unlocking powerful grounded interpretation capabilities for more accurate and efficient biomedical image analysis.
Related papers
- An Explainable Biomedical Foundation Model via Large-Scale Concept-Enhanced Vision-Language Pre-training [40.16314726875265]
ConceptCLIP is the first explainable biomedical foundation model that achieves state-of-the-art diagnostic accuracy.
We develop ConceptCLIP through a novel dual-alignment approach that simultaneously learns global image-text representations and fine-grained region-concept associations.
arXiv Detail & Related papers (2025-01-26T16:07:11Z) - BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature [73.39593644054865]
BIOMEDICA is a scalable, open-source framework to extract, annotate, and serialize the entirety of the PubMed Central Open Access subset into an easy-to-use, publicly accessible dataset.
Our framework produces a comprehensive archive with over 24 million unique image-text pairs from over 6 million articles.
BMCA-CLIP is a suite of CLIP-style models continuously pretrained on the BIOMEDICA dataset via streaming, eliminating the need to download 27 TB of data locally.
arXiv Detail & Related papers (2025-01-13T09:58:03Z) - BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models [2.2585213273821716]
We propose BiomedCoOp, a novel prompt learning framework for biomedical image analysis.<n>Our approach achieves effective prompt context learning by leveraging semantic consistency with average prompt ensembles from Large Language Models (LLMs) and knowledge distillation with a statistics-based prompt selection strategy.<n>We conducted comprehensive validation of our proposed framework on 11 medical datasets across 9 modalities and 10 organs against existing state-of-the-art methods, demonstrating significant improvements in both accuracy and generalizability.
arXiv Detail & Related papers (2024-11-21T19:13:04Z) - ViKL: A Mammography Interpretation Framework via Multimodal Aggregation of Visual-knowledge-linguistic Features [54.37042005469384]
We announce MVKL, the first multimodal mammography dataset encompassing multi-view images, detailed manifestations and reports.
Based on this dataset, we focus on the challanging task of unsupervised pretraining.
We propose ViKL, a framework that synergizes Visual, Knowledge, and Linguistic features.
arXiv Detail & Related papers (2024-09-24T05:01:23Z) - μ-Bench: A Vision-Language Benchmark for Microscopy Understanding [43.27182445778988]
Vision-language models (VLMs) offer a promising solution for large-scale biological image analysis.
There is a lack of standardized, diverse, and large-scale vision-language benchmarks to evaluate VLMs.
mu-Bench is an expert-curated benchmark encompassing 22 biomedical tasks.
arXiv Detail & Related papers (2024-07-01T20:30:26Z) - BiomedParse: a biomedical foundation model for image parsing of everything everywhere all at once [58.41069132627823]
holistic image analysis comprises subtasks such as segmentation, detection, and recognition of relevant objects.
Here, we propose BiomedParse, a biomedical foundation model for imaging parsing that can jointly conduct segmentation, detection, and recognition for 82 object types across 9 imaging modalities.
Through joint learning, we can improve accuracy for individual tasks and enable novel applications such as segmenting all relevant objects in a noisy image through a text prompt.
arXiv Detail & Related papers (2024-05-21T17:54:06Z) - BioLORD-2023: Semantic Textual Representations Fusing LLM and Clinical
Knowledge Graph Insights [15.952942443163474]
We propose a new state-of-the-art approach for obtaining high-fidelity representations of biomedical concepts and sentences.
We demonstrate consistent and substantial performance improvements over the previous state of the art.
Besides our new state-of-the-art biomedical model for English, we also distill and release a multilingual model compatible with 50+ languages.
arXiv Detail & Related papers (2023-11-27T18:46:17Z) - LLaVA-Med: Training a Large Language-and-Vision Assistant for
Biomedicine in One Day [85.19963303642427]
We propose a cost-efficient approach for training a vision-language conversational assistant that can answer open-ended research questions of biomedical images.
The model first learns to align biomedical vocabulary using the figure-caption pairs as is, then learns to master open-ended conversational semantics.
This enables us to train a Large Language and Vision Assistant for BioMedicine in less than 15 hours (with eight A100s)
arXiv Detail & Related papers (2023-06-01T16:50:07Z) - BiomedGPT: A Generalist Vision-Language Foundation Model for Diverse Biomedical Tasks [68.39821375903591]
Generalist AI holds the potential to address limitations due to its versatility in interpreting different data types.
Here, we propose BiomedGPT, the first open-source and lightweight vision-language foundation model.
arXiv Detail & Related papers (2023-05-26T17:14:43Z) - BiomedCLIP: a multimodal biomedical foundation model pretrained from fifteen million scientific image-text pairs [46.87322157229728]
We present PMC-15M, a novel dataset that is two orders of magnitude larger than existing biomedical multimodal datasets.<n> PMC-15M contains 15 million biomedical image-text pairs collected from 4.4 million scientific articles.<n>Based on PMC-15M, we have pretrained BiomedCLIP, a multimodal foundation model, with domain-specific adaptations tailored to biomedical vision-language processing.
arXiv Detail & Related papers (2023-03-02T02:20:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.