PathOrchestra: A Comprehensive Foundation Model for Computational Pathology with Over 100 Diverse Clinical-Grade Tasks
- URL: http://arxiv.org/abs/2503.24345v1
- Date: Mon, 31 Mar 2025 17:28:02 GMT
- Title: PathOrchestra: A Comprehensive Foundation Model for Computational Pathology with Over 100 Diverse Clinical-Grade Tasks
- Authors: Fang Yan, Jianfeng Wu, Jiawen Li, Wei Wang, Jiaxuan Lu, Wen Chen, Zizhao Gao, Jianan Li, Hong Yan, Jiabo Ma, Minda Chen, Yang Lu, Qing Chen, Yizhi Wang, Xitong Ling, Xuenian Wang, Zihan Wang, Qiang Huang, Shengyi Hua, Mianxin Liu, Lei Ma, Tian Shen, Xiaofan Zhang, Yonghong He, Hao Chen, Shaoting Zhang, Zhe Wang,
- Abstract summary: We present PathOrchestra, a versatile pathology foundation model trained via self-supervised learning on a dataset comprising 300K pathological slides.<n>The model was rigorously evaluated on 112 clinical tasks using a combination of 61 private and 51 public datasets.<n>PathOrchestra demonstrated exceptional performance across 27,755 WSIs and 9,415,729 ROIs, achieving over 0.950 accuracy in 47 tasks.
- Score: 39.97710183184273
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The complexity and variability inherent in high-resolution pathological images present significant challenges in computational pathology. While pathology foundation models leveraging AI have catalyzed transformative advancements, their development demands large-scale datasets, considerable storage capacity, and substantial computational resources. Furthermore, ensuring their clinical applicability and generalizability requires rigorous validation across a broad spectrum of clinical tasks. Here, we present PathOrchestra, a versatile pathology foundation model trained via self-supervised learning on a dataset comprising 300K pathological slides from 20 tissue and organ types across multiple centers. The model was rigorously evaluated on 112 clinical tasks using a combination of 61 private and 51 public datasets. These tasks encompass digital slide preprocessing, pan-cancer classification, lesion identification, multi-cancer subtype classification, biomarker assessment, gene expression prediction, and the generation of structured reports. PathOrchestra demonstrated exceptional performance across 27,755 WSIs and 9,415,729 ROIs, achieving over 0.950 accuracy in 47 tasks, including pan-cancer classification across various organs, lymphoma subtype diagnosis, and bladder cancer screening. Notably, it is the first model to generate structured reports for high-incidence colorectal cancer and diagnostically complex lymphoma-areas that are infrequently addressed by foundational models but hold immense clinical potential. Overall, PathOrchestra exemplifies the feasibility and efficacy of a large-scale, self-supervised pathology foundation model, validated across a broad range of clinical-grade tasks. Its high accuracy and reduced reliance on extensive data annotation underline its potential for clinical integration, offering a pathway toward more efficient and high-quality medical services.
Related papers
- A Knowledge-enhanced Pathology Vision-language Foundation Model for Cancer Diagnosis [58.85247337449624]
We propose a knowledge-enhanced vision-language pre-training approach that integrates disease knowledge into the alignment within hierarchical semantic groups.<n>KEEP achieves state-of-the-art performance in zero-shot cancer diagnostic tasks.
arXiv Detail & Related papers (2024-12-17T17:45:21Z) - Multimodal Whole Slide Foundation Model for Pathology [9.46103337205135]
We propose TITAN, a whole slide foundation model pretrained using visual self-supervised learning and vision-language alignment with pathology reports.<n>T TITAN can extract general-purpose slide representations and generate pathology reports that generalize to resource-limited clinical scenarios.
arXiv Detail & Related papers (2024-11-29T12:39:57Z) - Anatomy-Guided Radiology Report Generation with Pathology-Aware Regional Prompts [3.1019279528120363]
Radiology reporting generative AI holds significant potential to alleviate clinical workloads and streamline medical care.
Existing systems often fall short due to their reliance on fixed size, patch-level image features and insufficient incorporation of pathological information.
We propose an innovative approach that leverages pathology-aware regional prompts to explicitly integrate anatomical and pathological information of various scales.
arXiv Detail & Related papers (2024-11-16T12:36:20Z) - PathInsight: Instruction Tuning of Multimodal Datasets and Models for Intelligence Assisted Diagnosis in Histopathology [7.87900104748629]
We have meticulously compiled a dataset of approximately 45,000 cases, covering over 6 different tasks.
We have fine-tuned multimodal large models, specifically LLaVA, Qwen-VL, InternLM, with this dataset to enhance instruction-based performance.
arXiv Detail & Related papers (2024-08-13T17:05:06Z) - A Lung Nodule Dataset with Histopathology-based Cancer Type Annotation [12.617587827105496]
This research aims to bridge the gap by providing publicly accessible datasets and reliable tools for medical diagnosis.
We curated a diverse dataset of lung Computed Tomography (CT) images, comprising 330 annotated nodules (nodules are labeled as bounding boxes) from 95 distinct patients.
These promising results demonstrate that the dataset has a feasible application and further facilitate intelligent auxiliary diagnosis.
arXiv Detail & Related papers (2024-06-26T06:39:11Z) - Large-scale Long-tailed Disease Diagnosis on Radiology Images [51.453990034460304]
RadDiag is a foundational model supporting 2D and 3D inputs across various modalities and anatomies.
Our dataset, RP3D-DiagDS, contains 40,936 cases with 195,010 scans covering 5,568 disorders.
arXiv Detail & Related papers (2023-12-26T18:20:48Z) - ChatRadio-Valuer: A Chat Large Language Model for Generalizable
Radiology Report Generation Based on Multi-institution and Multi-system Data [115.0747462486285]
ChatRadio-Valuer is a tailored model for automatic radiology report generation that learns generalizable representations.
The clinical dataset utilized in this study encompasses a remarkable total of textbf332,673 observations.
ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al.
arXiv Detail & Related papers (2023-10-08T17:23:17Z) - A General-Purpose Self-Supervised Model for Computational Pathology [9.505290216109609]
We introduce UNI, a general-purpose self-supervised model for pathology, pretrained using over 100 million tissue patches from over 100,000 diagnostic haematoxylin and eosin WSIs.
We demonstrate new modeling capabilities in CPath such as resolution-agnostic tissue classification, slide classification using few-shot class prototypes, and disease subtyping generalization in classifying up to 108 cancer types.
arXiv Detail & Related papers (2023-08-29T17:52:10Z) - A Transformer-based representation-learning model with unified
processing of multimodal input for clinical diagnostics [63.106382317917344]
We report a Transformer-based representation-learning model as a clinical diagnostic aid that processes multimodal input in a unified manner.
The unified model outperformed an image-only model and non-unified multimodal diagnosis models in the identification of pulmonary diseases.
arXiv Detail & Related papers (2023-06-01T16:23:47Z) - WSSS4LUAD: Grand Challenge on Weakly-supervised Tissue Semantic
Segmentation for Lung Adenocarcinoma [51.50991881342181]
This challenge includes 10,091 patch-level annotations and over 130 million labeled pixels.
First place team achieved mIoU of 0.8413 (tumor: 0.8389, stroma: 0.7931, normal: 0.8919)
arXiv Detail & Related papers (2022-04-13T15:27:05Z) - Weakly supervised multiple instance learning histopathological tumor
segmentation [51.085268272912415]
We propose a weakly supervised framework for whole slide imaging segmentation.
We exploit a multiple instance learning scheme for training models.
The proposed framework has been evaluated on multi-locations and multi-centric public data from The Cancer Genome Atlas and the PatchCamelyon dataset.
arXiv Detail & Related papers (2020-04-10T13:12:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.