Related papers: PolyPath: Adapting a Large Multimodal Model for Multi-slide Pathology Report Generation

PolyPath: Adapting a Large Multimodal Model for Multi-slide Pathology Report Generation

URL: http://arxiv.org/abs/2502.10536v1
Date: Fri, 14 Feb 2025 20:09:13 GMT
Title: PolyPath: Adapting a Large Multimodal Model for Multi-slide Pathology Report Generation
Authors: Faruk Ahmed, Lin Yang, Tiam Jaroensri, Andrew Sellergren, Yossi Matias, Avinatan Hassidim, Greg S. Corrado, Dale R. Webster, Shravya Shetty, Shruthi Prabhakara, Yun Liu, Daniel Golden, Ellery Wulczyn, David F. Steiner,
Abstract summary: We demonstrate the ability to generate diagnoses from up to 40,000 768x pixel image patches from multiple whole-slide images at 10X magnification.<n>Expert pathologist evaluations demonstrate that the generated report text is clinically accurate and equivalent to or preferred over the original reporting.
Score: 18.734721574528702
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The interpretation of histopathology cases underlies many important diagnostic and treatment decisions in medicine. Notably, this process typically requires pathologists to integrate and summarize findings across multiple slides per case. Existing vision-language capabilities in computational pathology have so far been largely limited to small regions of interest, larger regions at low magnification, or single whole-slide images (WSIs). This limits interpretation of findings that span multiple high-magnification regions across multiple WSIs. By making use of Gemini 1.5 Flash, a large multimodal model (LMM) with a 1-million token context window, we demonstrate the ability to generate bottom-line diagnoses from up to 40,000 768x768 pixel image patches from multiple WSIs at 10X magnification. This is the equivalent of up to 11 hours of video at 1 fps. Expert pathologist evaluations demonstrate that the generated report text is clinically accurate and equivalent to or preferred over the original reporting for 68% (95% CI: [60%, 76%]) of multi-slide examples with up to 5 slides. While performance decreased for examples with 6 or more slides, this study demonstrates the promise of leveraging the long-context capabilities of modern LMMs for the uniquely challenging task of medical report generation where each case can contain thousands of image patches.

Related papers

Navigating Gigapixel Pathology Images with Large Multimodal Models [0.649324006529432]
General-purpose large multimodal models (LMMs) have generally shown poor or inconclusive performance in medical image interpretation.<n>We introduce Gigapixel Image Agent for Navigating Tissue (GIANT), the first framework that allows LMMs to iteratively navigate whole-slide images like a pathologist.<n>Using MultiPathQA, we show that our simple agentic system substantially outperforms conventional patch- and thumbnail-based baselines.
arXiv Detail & Related papers (2025-11-24T19:33:56Z)
A Graph-Based Framework for Interpretable Whole Slide Image Analysis [86.37618055724441]
We develop a framework that transforms whole-slide images into biologically-informed graph representations.<n>Our approach builds graph nodes from tissue regions that respect natural structures, not arbitrary grids.<n>We demonstrate strong performance on challenging cancer staging and survival prediction tasks.
arXiv Detail & Related papers (2025-03-14T20:15:04Z)
PATHS: A Hierarchical Transformer for Efficient Whole Slide Image Analysis [9.862551438475666]
We propose a novel top-down method for hierarchical weakly supervised representation learning on slide-level tasks in computational pathology.<n>PATHS is inspired by the cross-magnification manner in which a human pathologist examines a slide, filtering patches at each magnification level to a small subset relevant to the diagnosis.<n>We apply PATHS to five datasets of The Cancer Genome Atlas (TCGA), and achieve superior performance on slide-level prediction tasks.
arXiv Detail & Related papers (2024-11-27T11:03:38Z)
WsiCaption: Multiple Instance Generation of Pathology Reports for Gigapixel Whole-Slide Images [5.960501267687475]
We investigate how to generate pathology reports given whole slide images. We curated the largest WSI-text dataset (PathText) On the model end, we propose the multiple instance generative model (MI-Gen)
arXiv Detail & Related papers (2023-11-27T05:05:41Z)
XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models [60.437091462613544]
We introduce XrayGPT, a novel conversational medical vision-language model. It can analyze and answer open-ended questions about chest radiographs. We generate 217k interactive and high-quality summaries from free-text radiology reports.
arXiv Detail & Related papers (2023-06-13T17:59:59Z)
AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context Processing for Representation Learning of Giga-pixel Images [53.29794593104923]
We present a novel concept of shared-context processing for whole slide histopathology images. AMIGO uses the celluar graph within the tissue to provide a single representation for a patient. We show that our model is strongly robust to missing information to an extent that it can achieve the same performance with as low as 20% of the data.
arXiv Detail & Related papers (2023-03-01T23:37:45Z)
Learning to Exploit Temporal Structure for Biomedical Vision-Language Processing [53.89917396428747]
Self-supervised learning in vision-language processing exploits semantic alignment between imaging and text modalities. We explicitly account for prior images and reports when available during both training and fine-tuning. Our approach, named BioViL-T, uses a CNN-Transformer hybrid multi-image encoder trained jointly with a text model.
arXiv Detail & Related papers (2023-01-11T16:35:33Z)
Hierarchical Transformer for Survival Prediction Using Multimodality Whole Slide Images and Genomics [63.76637479503006]
Learning good representation of giga-pixel level whole slide pathology images (WSI) for downstream tasks is critical. This paper proposes a hierarchical-based multimodal transformer framework that learns a hierarchical mapping between pathology images and corresponding genes. Our architecture requires fewer GPU resources compared with benchmark methods while maintaining better WSI representation ability.
arXiv Detail & Related papers (2022-11-29T23:47:56Z)
Multi-Modal Multi-Instance Learning for Retinal Disease Recognition [10.294738095942812]
We aim to build a deep neural network that recognizes multiple vision-threatening diseases for the given case. As both data acquisition and manual labeling are extremely expensive in the medical domain, the network has to be relatively lightweight.
arXiv Detail & Related papers (2021-09-25T08:16:47Z)
Modality Completion via Gaussian Process Prior Variational Autoencoders for Multi-Modal Glioma Segmentation [75.58395328700821]
We propose a novel model, Multi-modal Gaussian Process Prior Variational Autoencoder (MGP-VAE), to impute one or more missing sub-modalities for a patient scan. MGP-VAE can leverage the Gaussian Process (GP) prior on the Variational Autoencoder (VAE) to utilize the subjects/patients and sub-modalities correlations. We show the applicability of MGP-VAE on brain tumor segmentation where either, two, or three of four sub-modalities may be missing.
arXiv Detail & Related papers (2021-07-07T19:06:34Z)
Generative Adversarial U-Net for Domain-free Medical Image Augmentation [49.72048151146307]
The shortage of annotated medical images is one of the biggest challenges in the field of medical image computing. In this paper, we develop a novel generative method named generative adversarial U-Net. Our newly designed model is domain-free and generalizable to various medical images.
arXiv Detail & Related papers (2021-01-12T23:02:26Z)
Segmentation of Cellular Patterns in Confocal Images of Melanocytic Lesions in vivo via a Multiscale Encoder-Decoder Network (MED-Net) [2.0487455621441377]
"Multiscale-Decoder Network (MED-Net)" provides pixel-wise labeling into classes of patterns in a quantitative manner. We trained and tested our model on non-overlapping partitions of 117 reflectance confocal microscopy (RCM) mosaics of melanocytic lesions.
arXiv Detail & Related papers (2020-01-03T22:34:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.