Related papers: Advancing Multimodal Medical Capabilities of Gemini

Advancing Multimodal Medical Capabilities of Gemini

URL: http://arxiv.org/abs/2405.03162v1
Date: Mon, 6 May 2024 04:44:22 GMT
Title: Advancing Multimodal Medical Capabilities of Gemini
Authors: Lin Yang, Shawn Xu, Andrew Sellergren, Timo Kohlberger, Yuchen Zhou, Ira Ktena, Atilla Kiraly, Faruk Ahmed, Farhad Hormozdiari, Tiam Jaroensri, Eric Wang, Ellery Wulczyn, Fayaz Jamil, Theo Guidroz, Chuck Lau, Siyuan Qiao, Yun Liu, Akshay Goel, Kendall Park, Arnav Agharwal, Nick George, Yang Wang, Ryutaro Tanno, David G. T. Barrett, Wei-Hung Weng, S. Sara Mahdavi, Khaled Saab, Tao Tu, Sreenivasa Raju Kalidindi, Mozziyar Etemadi, Jorge Cuadros, Gregory Sorensen, Yossi Matias, Katherine Chou, Greg Corrado, Joelle Barral, Shravya Shetty, David Fleet, S. M. Ali Eslami, Daniel Tse, Shruthi Prabhakara, Cory McLean, Dave Steiner, Rory Pilgrim, Christopher Kelly, Shekoofeh Azizi, Daniel Golden,
Abstract summary: We develop several models within the new Med-Gemini family that inherit core capabilities of Gemini. Med-Gemini-2D sets a new standard for AI-based chest X-ray (CXR) report generation based on expert evaluation. Med-Gemini-3D is the first ever large multimodal model-based report generation for 3D computed tomography (CT) volumes.
Score: 32.28727204275662
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Many clinical tasks require an understanding of specialized data, such as medical images and genomics, which is not typically found in general-purpose large multimodal models. Building upon Gemini's multimodal models, we develop several models within the new Med-Gemini family that inherit core capabilities of Gemini and are optimized for medical use via fine-tuning with 2D and 3D radiology, histopathology, ophthalmology, dermatology and genomic data. Med-Gemini-2D sets a new standard for AI-based chest X-ray (CXR) report generation based on expert evaluation, exceeding previous best results across two separate datasets by an absolute margin of 1% and 12%, where 57% and 96% of AI reports on normal cases, and 43% and 65% on abnormal cases, are evaluated as "equivalent or better" than the original radiologists' reports. We demonstrate the first ever large multimodal model-based report generation for 3D computed tomography (CT) volumes using Med-Gemini-3D, with 53% of AI reports considered clinically acceptable, although additional research is needed to meet expert radiologist reporting quality. Beyond report generation, Med-Gemini-2D surpasses the previous best performance in CXR visual question answering (VQA) and performs well in CXR classification and radiology VQA, exceeding SoTA or baselines on 17 of 20 tasks. In histopathology, ophthalmology, and dermatology image classification, Med-Gemini-2D surpasses baselines across 18 out of 20 tasks and approaches task-specific model performance. Beyond imaging, Med-Gemini-Polygenic outperforms the standard linear polygenic risk score-based approach for disease risk prediction and generalizes to genetically correlated diseases for which it has never been trained. Although further development and evaluation are necessary in the safety-critical medical domain, our results highlight the potential of Med-Gemini across a wide range of medical tasks.

Related papers

A Unified Multimodal Framework for Dataset Construction and Model-Based Diagnosis of Ameloblastoma [0.1499944454332829]
We present a newly curated multimodal dataset specifically focused on ameloblastoma.<n>Natural language processing techniques were employed to extract clinically relevant features from textual reports.<n>A multimodal deep learning model was developed to classify ameloblastoma variants, assess behavioral patterns such as recurrence risk, and support surgical planning.
arXiv Detail & Related papers (2026-02-05T10:15:34Z)
Closing the Performance Gap Between AI and Radiologists in Chest X-Ray Reporting [40.40577855417923]
We introduce MAIRA-X, a clinically evaluated multimodal AI model for longitudinal chest X-ray report generation.<n>A novel L&T-specific metrics framework was developed to assess accuracy in reporting attributes such as type, longitudinal change and placement.<n>Our results suggest MAIRA-X can effectively assist radiologists, particularly in high-volume clinical settings.
arXiv Detail & Related papers (2025-11-21T10:53:26Z)
A High Magnifications Histopathology Image Dataset for Oral Squamous Cell Carcinoma Diagnosis and Prognosis [18.549808005574985]
Multi-OSCC is a new histopathology image dataset comprising 1,325 Oral Squamous Cell Carcinoma patients.<n>Each patient is represented by six high resolution histopathology images captured at x200, x400, and x1000 magnifications-two per magnification-covering both the core and edge tumor regions.<n>The dataset is richly annotated for six critical clinical tasks: recurrence prediction (REC), lymph node metastasis (LNM), tumor differentiation (TD), tumor invasion (TI) and perineural invasion (PI)
arXiv Detail & Related papers (2025-07-22T08:48:45Z)
Deep Learning-Based Noninvasive Screening of Type 2 Diabetes with Chest X-ray Images and Electronic Health Records [2.2940141855172036]
This study evaluates the integration of chest X-ray (CXR) images with other noninvasive data sources, including electronic health records (EHRs) and electrocardiography signals, for T2DM detection. The end-to-end trained ResNet-LSTM model achieved an AUROC of 0.86, surpassing the CXR-only baseline by 2.3% with just 9863 training samples.
arXiv Detail & Related papers (2024-12-14T20:18:16Z)
3D-CT-GPT: Generating 3D Radiology Reports through Integration of Large Vision-Language Models [51.855377054763345]
This paper introduces 3D-CT-GPT, a Visual Question Answering (VQA)-based medical visual language model for generating radiology reports from 3D CT scans. Experiments on both public and private datasets demonstrate that 3D-CT-GPT significantly outperforms existing methods in terms of report accuracy and quality.
arXiv Detail & Related papers (2024-09-28T12:31:07Z)
MGH Radiology Llama: A Llama 3 70B Model for Radiology [27.575944159578786]
This paper presents an advanced radiology-focused large language model: MGH Radiology Llama. It is developed using the Llama 3 70B model, building upon previous domain-specific models like Radiology-GPT and Radiology-Llama2. Our evaluation, incorporating both traditional metrics and a GPT-4-based assessment, highlights the enhanced performance of this work over general-purpose LLMs.
arXiv Detail & Related papers (2024-08-13T01:30:03Z)
Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports [51.45762396192655]
Multimodal large language models (MLLMs) have recently transformed many domains, significantly affecting the medical field. Notably, Gemini-Vision-series (Gemini) and GPT-4-series (GPT-4) models have epitomized a paradigm shift in Artificial General Intelligence for computer vision. This study evaluated the performance of the Gemini, GPT-4, and 4 popular large models for an exhaustive evaluation across 14 medical imaging datasets.
arXiv Detail & Related papers (2024-07-08T09:08:42Z)
Capabilities of Gemini Models in Medicine [100.60391771032887]
We introduce Med-Gemini, a family of highly capable multimodal models specialized in medicine. We evaluate Med-Gemini on 14 medical benchmarks, establishing new state-of-the-art (SoTA) performance on 10 of them. Our results offer compelling evidence for Med-Gemini's potential, although further rigorous evaluation will be crucial before real-world deployment.
arXiv Detail & Related papers (2024-04-29T04:11:28Z)
A Concept-based Interpretable Model for the Diagnosis of Choroid Neoplasias using Multimodal Data [28.632437578685842]
We focus on choroid neoplasias, the most prevalent form of eye cancer in adults, albeit rare with 5.1 per million. Our work introduces a concept-based interpretable model that distinguishes between three types of choroidal tumors, integrating insights from domain experts via radiological reports. Remarkably, this model achieves an F1 score of 0.91, rivaling that of black-box models, but also boosts the diagnostic accuracy of junior doctors by 42%.
arXiv Detail & Related papers (2024-03-08T07:15:53Z)
ChatRadio-Valuer: A Chat Large Language Model for Generalizable Radiology Report Generation Based on Multi-institution and Multi-system Data [115.0747462486285]
ChatRadio-Valuer is a tailored model for automatic radiology report generation that learns generalizable representations. The clinical dataset utilized in this study encompasses a remarkable total of textbf332,673 observations. ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al.
arXiv Detail & Related papers (2023-10-08T17:23:17Z)
Towards Generalist Foundation Model for Radiology by Leveraging Web-scale 2D&3D Medical Data [66.9359934608229]
This study aims to initiate the development of Radiology Foundation Model, termed as RadFM. To the best of our knowledge, this is the first large-scale, high-quality, medical visual-language dataset, with both 2D and 3D scans. We propose a new evaluation benchmark, RadBench, that comprises five tasks, including modality recognition, disease diagnosis, visual question answering, report generation and rationale diagnosis.
arXiv Detail & Related papers (2023-08-04T17:00:38Z)
Towards Generalist Biomedical AI [28.68106423175678]
We introduce Med-PaLM Multimodal (Med-PaLM M), our proof of concept for a generalist biomedical AI system. Med-PaLM M is a large multimodal generative model that flexibly encodes and interprets biomedical data. We conduct a radiologist evaluation of model-generated (and human) chest X-ray reports and observe encouraging performance across model scales.
arXiv Detail & Related papers (2023-07-26T17:52:22Z)
A Novel Automated Classification and Segmentation for COVID-19 using 3D CT Scans [5.5957919486531935]
In COVID-19 computed tomography (CT) images of the lungs, ground glass turbidity is the most common finding that requires specialist diagnosis. Some researchers propose the relevant DL models which can replace professional diagnostic specialists in clinics when lacking expertise. Our model achieves 94.52% accuracy in the classification of lung lesions by 3 types: COVID, Pneumonia and Normal.
arXiv Detail & Related papers (2022-08-04T22:14:18Z)
Federated Learning Enables Big Data for Rare Cancer Boundary Detection [98.5549882883963]
We present findings from the largest Federated ML study to-date, involving data from 71 healthcare institutions across 6 continents. We generate an automatic tumor boundary detector for the rare disease of glioblastoma. We demonstrate a 33% improvement over a publicly trained model to delineate the surgically targetable tumor, and 23% improvement over the tumor's entire extent.
arXiv Detail & Related papers (2022-04-22T17:27:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.