Related papers: A Foundational Multimodal Vision Language AI Assistant for Human Pathology

A Foundational Multimodal Vision Language AI Assistant for Human Pathology

URL: http://arxiv.org/abs/2312.07814v1
Date: Wed, 13 Dec 2023 00:24:37 GMT
Title: A Foundational Multimodal Vision Language AI Assistant for Human Pathology
Authors: Ming Y. Lu, Bowen Chen, Drew F. K. Williamson, Richard J. Chen, Kenji Ikamura, Georg Gerber, Ivy Liang, Long Phi Le, Tong Ding, Anil V Parwani, Faisal Mahmood
Abstract summary: We present PathChat, a vision-language generalist AI assistant for human pathology using an in-house developed vision encoder pretrained on 100 million histology images. PathChat achieved a diagnostic accuracy of 87% on multiple-choice questions based on publicly available cases of diverse tissue origins and disease models.
Score: 6.759775793033743
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The field of computational pathology has witnessed remarkable progress in the development of both task-specific predictive models and task-agnostic self-supervised vision encoders. However, despite the explosive growth of generative artificial intelligence (AI), there has been limited study on building general purpose, multimodal AI assistants tailored to pathology. Here we present PathChat, a vision-language generalist AI assistant for human pathology using an in-house developed foundational vision encoder pretrained on 100 million histology images from over 100,000 patient cases and 1.18 million pathology image-caption pairs. The vision encoder is then combined with a pretrained large language model and the whole system is finetuned on over 250,000 diverse disease agnostic visual language instructions. We compare PathChat against several multimodal vision language AI assistants as well as GPT4V, which powers the commercially available multimodal general purpose AI assistant ChatGPT-4. When relevant clinical context is provided with the histology image, PathChat achieved a diagnostic accuracy of 87% on multiple-choice questions based on publicly available cases of diverse tissue origins and disease models. Additionally, using open-ended questions and human expert evaluation, we found that overall PathChat produced more accurate and pathologist-preferable responses to diverse queries related to pathology. As an interactive and general vision language AI assistant that can flexibly handle both visual and natural language inputs, PathChat can potentially find impactful applications in pathology education, research, and human-in-the-loop clinical decision making.

Related papers

Evidence-based diagnostic reasoning with multi-agent copilot for human pathology [7.976907866539546]
Current multimodal large language models (MLLMs) in computational pathology face limitations.<n>We introduce PathChat+, a new MLLM specifically designed for human pathology, trained on over 1 million diverse, pathology-specific instruction samples.<n>We also present SlideSeek, a reasoning-enabled multi-agent AI system leveraging PathChat+ to autonomously evaluate gigapixel whole-slide images.
arXiv Detail & Related papers (2025-06-26T03:02:16Z)
Reinforced Correlation Between Vision and Language for Precise Medical AI Assistant [11.187690318227514]
RCMed is a full-stack AI assistant that improves multimodal alignment in both input and output.<n>It achieves state-of-the-art precision in contextualizing irregular lesions and subtle anatomical boundaries.
arXiv Detail & Related papers (2025-05-06T10:00:08Z)
Beyond the Monitor: Mixed Reality Visualization and AI for Enhanced Digital Pathology Workflow [0.0]
Pathologists rely on gigapixel whole-slide images (WSIs) to diagnose diseases like cancer.<n>Current digital pathology tools hinder diagnosis.<n>PathVis is a mixed-reality visualization platform for Apple Vision Pro.
arXiv Detail & Related papers (2025-05-05T16:46:53Z)
ViKL: A Mammography Interpretation Framework via Multimodal Aggregation of Visual-knowledge-linguistic Features [54.37042005469384]
We announce MVKL, the first multimodal mammography dataset encompassing multi-view images, detailed manifestations and reports. Based on this dataset, we focus on the challanging task of unsupervised pretraining. We propose ViKL, a framework that synergizes Visual, Knowledge, and Linguistic features.
arXiv Detail & Related papers (2024-09-24T05:01:23Z)
Exploring the Feasibility of Multimodal Chatbot AI as Copilot in Pathology Diagnostics: Generalist Model's Pitfall [17.9731336178034]
ChatGPT and other multimodal models have shown promise in transforming medical image analysis through capabilities such as medical vision-language question answering. This study benchmarks the performance of GPT on pathology images, assessing their diagnostic accuracy and efficiency in real-word clinical records. We observe significant deficits of GPT in bone diseases and a fair-level performance in diseases from other three systems.
arXiv Detail & Related papers (2024-09-04T01:30:05Z)
Knowledge-enhanced Visual-Language Pretraining for Computational Pathology [68.6831438330526]
We consider the problem of visual representation learning for computational pathology, by exploiting large-scale image-text pairs gathered from public resources. We curate a pathology knowledge tree that consists of 50,470 informative attributes for 4,718 diseases requiring pathology diagnosis from 32 human tissues.
arXiv Detail & Related papers (2024-04-15T17:11:25Z)
Hierarchical Text-to-Vision Self Supervised Alignment for Improved Histopathology Representation Learning [64.1316997189396]
We present a novel language-tied self-supervised learning framework, Hierarchical Language-tied Self-Supervision (HLSS) for histopathology images. Our resulting model achieves state-of-the-art performance on two medical imaging benchmarks, OpenSRH and TCGA datasets.
arXiv Detail & Related papers (2024-03-21T17:58:56Z)
A General-purpose AI Avatar in Healthcare [1.5081825869395544]
This paper focuses on the role of chatbots in healthcare and explores the use of avatars to make AI interactions more appealing to patients. A framework of a general-purpose AI avatar application is demonstrated by using a three-category prompt dictionary and prompt improvement mechanism. A two-phase approach is suggested to fine-tune a general-purpose AI language model and create different AI avatars to discuss medical issues with users.
arXiv Detail & Related papers (2024-01-10T03:44:15Z)
Towards a Visual-Language Foundation Model for Computational Pathology [5.72536252929528]
We introduce CONtrastive learning from Captions for Histopathology (CONCH) CONCH is a visual-language foundation model developed using diverse sources of histopathology images, biomedical text, and task-agnostic pretraining. It is evaluated on a suite of 13 diverse benchmarks, achieving state-of-the-art performance on histology image classification, segmentation, captioning, text-to-image and image-to-text retrieval.
arXiv Detail & Related papers (2023-07-24T16:13:43Z)
XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models [60.437091462613544]
We introduce XrayGPT, a novel conversational medical vision-language model. It can analyze and answer open-ended questions about chest radiographs. We generate 217k interactive and high-quality summaries from free-text radiology reports.
arXiv Detail & Related papers (2023-06-13T17:59:59Z)
LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day [85.19963303642427]
We propose a cost-efficient approach for training a vision-language conversational assistant that can answer open-ended research questions of biomedical images. The model first learns to align biomedical vocabulary using the figure-caption pairs as is, then learns to master open-ended conversational semantics. This enables us to train a Large Language and Vision Assistant for BioMedicine in less than 15 hours (with eight A100s)
arXiv Detail & Related papers (2023-06-01T16:50:07Z)
BiomedGPT: A Generalist Vision-Language Foundation Model for Diverse Biomedical Tasks [68.39821375903591]
Generalist AI holds the potential to address limitations due to its versatility in interpreting different data types. Here, we propose BiomedGPT, the first open-source and lightweight vision-language foundation model.
arXiv Detail & Related papers (2023-05-26T17:14:43Z)
PathAsst: A Generative Foundation AI Assistant Towards Artificial General Intelligence of Pathology [15.419350834457136]
We present PathAsst, a multimodal generative foundation AI assistant to revolutionize diagnostic and predictive analytics in pathology. The development of PathAsst involves three pivotal steps: data acquisition, CLIP model adaptation, and the training of PathAsst's multimodal generative capabilities. The experimental results of PathAsst show the potential of harnessing AI-powered generative foundation model to improve pathology diagnosis and treatment processes.
arXiv Detail & Related papers (2023-05-24T11:55:50Z)

This list is automatically generated from the titles and abstracts of the papers in this site.