PathAsst: A Generative Foundation AI Assistant Towards Artificial
General Intelligence of Pathology
- URL: http://arxiv.org/abs/2305.15072v2
- Date: Mon, 19 Feb 2024 07:02:15 GMT
- Title: PathAsst: A Generative Foundation AI Assistant Towards Artificial
General Intelligence of Pathology
- Authors: Yuxuan Sun, Chenglu Zhu, Sunyi Zheng, Kai Zhang, Lin Sun, Zhongyi
Shui, Yunlong Zhang, Honglin Li, Lin Yang
- Abstract summary: We present PathAsst, a multimodal generative foundation AI assistant to revolutionize diagnostic and predictive analytics in pathology.
The development of PathAsst involves three pivotal steps: data acquisition, CLIP model adaptation, and the training of PathAsst's multimodal generative capabilities.
The experimental results of PathAsst show the potential of harnessing AI-powered generative foundation model to improve pathology diagnosis and treatment processes.
- Score: 15.419350834457136
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As advances in large language models (LLMs) and multimodal techniques
continue to mature, the development of general-purpose multimodal large
language models (MLLMs) has surged, offering significant applications in
interpreting natural images. However, the field of pathology has largely
remained untapped, particularly in gathering high-quality data and designing
comprehensive model frameworks. To bridge the gap in pathology MLLMs, we
present PathAsst, a multimodal generative foundation AI assistant to
revolutionize diagnostic and predictive analytics in pathology. The development
of PathAsst involves three pivotal steps: data acquisition, CLIP model
adaptation, and the training of PathAsst's multimodal generative capabilities.
Firstly, we collect over 207K high-quality pathology image-text pairs from
authoritative sources. Leveraging the advanced power of ChatGPT, we generate
over 180K instruction-following samples. Furthermore, we devise additional
instruction-following data specifically tailored for invoking eight
pathology-specific sub-models we prepared, allowing the PathAsst to effectively
collaborate with these models, enhancing its diagnostic ability. Secondly, by
leveraging the collected data, we construct PathCLIP, a pathology-dedicated
CLIP, to enhance PathAsst's capabilities in interpreting pathology images.
Finally, we integrate PathCLIP with the Vicuna-13b and utilize
pathology-specific instruction-tuning data to enhance the multimodal generation
capacity of PathAsst and bolster its synergistic interactions with sub-models.
The experimental results of PathAsst show the potential of harnessing
AI-powered generative foundation model to improve pathology diagnosis and
treatment processes.
Related papers
- MLLM4PUE: Toward Universal Embeddings in Computational Pathology through Multimodal LLMs [34.454047458272505]
We highlight the need for universal multimodal embeddings that can support multiple downstream tasks.
Previous approaches often involve fine-tuning CLIP-based models, which handle images and text separately.
We introduce the Pathology Multimodal Embedding Benchmark (PMEB), a benchmark designed to assess the quality of pathology multimodal embeddings.
arXiv Detail & Related papers (2025-02-11T03:28:55Z) - Continually Evolved Multimodal Foundation Models for Cancer Prognosis [50.43145292874533]
Cancer prognosis is a critical task that involves predicting patient outcomes and survival rates.
Previous studies have integrated diverse data modalities, such as clinical notes, medical images, and genomic data, leveraging their complementary information.
Existing approaches face two major limitations. First, they struggle to incorporate newly arrived data with varying distributions into training, such as patient records from different hospitals.
Second, most multimodal integration methods rely on simplistic concatenation or task-specific pipelines, which fail to capture the complex interdependencies across modalities.
arXiv Detail & Related papers (2025-01-30T06:49:57Z) - A Multi-Modal Deep Learning Framework for Pan-Cancer Prognosis [15.10417643788382]
In this paper, a deep-learning based model, named UMPSNet, is proposed.
UMPSNet integrates four types of important meta data (demographic information, cancer type information, treatment protocols, and diagnosis results) into text templates, and then introduces a text encoder to extract textual features.
By incorporating the multi-modality of patient data and joint training, UMPSNet outperforms all SOTA approaches.
arXiv Detail & Related papers (2025-01-13T02:29:42Z) - PathInsight: Instruction Tuning of Multimodal Datasets and Models for Intelligence Assisted Diagnosis in Histopathology [7.87900104748629]
We have meticulously compiled a dataset of approximately 45,000 cases, covering over 6 different tasks.
We have fine-tuned multimodal large models, specifically LLaVA, Qwen-VL, InternLM, with this dataset to enhance instruction-based performance.
arXiv Detail & Related papers (2024-08-13T17:05:06Z) - PathoWAve: A Deep Learning-based Weight Averaging Method for Improving Domain Generalization in Histopathology Images [13.362177469092963]
We introduce Pathology Weight Averaging (PathoWAve) to tackle domain shift phenomenon in histopathology image analysis.
Our results on Camelyon17 WILDS dataset demonstrate PathoWAve's superiority over previous proposed methods.
arXiv Detail & Related papers (2024-06-21T23:25:44Z) - Knowledge-enhanced Visual-Language Pretraining for Computational Pathology [68.6831438330526]
We consider the problem of visual representation learning for computational pathology, by exploiting large-scale image-text pairs gathered from public resources.
We curate a pathology knowledge tree that consists of 50,470 informative attributes for 4,718 diseases requiring pathology diagnosis from 32 human tissues.
arXiv Detail & Related papers (2024-04-15T17:11:25Z) - HEALNet: Multimodal Fusion for Heterogeneous Biomedical Data [10.774128925670183]
This paper presents the Hybrid Early-fusion Attention Learning Network (HEALNet), a flexible multimodal fusion architecture.
We conduct multimodal survival analysis on Whole Slide Images and Multi-omic data on four cancer datasets from The Cancer Genome Atlas (TCGA)
HEALNet achieves state-of-the-art performance compared to other end-to-end trained fusion models.
arXiv Detail & Related papers (2023-11-15T17:06:26Z) - PathLDM: Text conditioned Latent Diffusion Model for Histopathology [62.970593674481414]
We introduce PathLDM, the first text-conditioned Latent Diffusion Model tailored for generating high-quality histopathology images.
Our approach fuses image and textual data to enhance the generation process.
We achieved a SoTA FID score of 7.64 for text-to-image generation on the TCGA-BRCA dataset, significantly outperforming the closest text-conditioned competitor with FID 30.1.
arXiv Detail & Related papers (2023-09-01T22:08:32Z) - Validating polyp and instrument segmentation methods in colonoscopy through Medico 2020 and MedAI 2021 Challenges [58.32937972322058]
"Medico automatic polyp segmentation (Medico 2020)" and "MedAI: Transparency in Medical Image (MedAI 2021)" competitions.
We present a comprehensive summary and analyze each contribution, highlight the strength of the best-performing methods, and discuss the possibility of clinical translations of such methods into the clinic.
arXiv Detail & Related papers (2023-07-30T16:08:45Z) - A multi-stage machine learning model on diagnosis of esophageal
manometry [50.591267188664666]
The framework includes deep-learning models at the swallow-level stage and feature-based machine learning models at the study-level stage.
This is the first artificial-intelligence-style model to automatically predict CC diagnosis of HRM study from raw multi-swallow data.
arXiv Detail & Related papers (2021-06-25T20:09:23Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.