Related papers: WsiCaption: Multiple Instance Generation of Pathology Reports for Gigapixel Whole-Slide Images

WsiCaption: Multiple Instance Generation of Pathology Reports for Gigapixel Whole-Slide Images

URL: http://arxiv.org/abs/2311.16480v4
Date: Thu, 27 Jun 2024 12:38:12 GMT
Title: WsiCaption: Multiple Instance Generation of Pathology Reports for Gigapixel Whole-Slide Images
Authors: Pingyi Chen, Honglin Li, Chenglu Zhu, Sunyi Zheng, Zhongyi Shui, Lin Yang,
Abstract summary: We investigate how to generate pathology reports given whole slide images. We curated the largest WSI-text dataset (PathText) On the model end, we propose the multiple instance generative model (MI-Gen)
Score: 5.960501267687475
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Whole slide images are the foundation of digital pathology for the diagnosis and treatment of carcinomas. Writing pathology reports is laborious and error-prone for inexperienced pathologists. To reduce the workload and improve clinical automation, we investigate how to generate pathology reports given whole slide images. On the data end, we curated the largest WSI-text dataset (PathText). In specific, we collected nearly 10000 high-quality WSI-text pairs for visual-language models by recognizing and cleaning pathology reports which narrate diagnostic slides in TCGA. On the model end, we propose the multiple instance generative model (MI-Gen) which can produce pathology reports for gigapixel WSIs. We benchmark our model on the largest subset of TCGA-PathoText. Experimental results show our model can generate pathology reports which contain multiple clinical clues and achieve competitive performance on certain slide-level tasks. We observe that simple semantic extraction from the pathology reports can achieve the best performance (0.838 of F1 score) on BRCA subtyping surpassing previous state-of-the-art approaches. Our collected dataset and related code are available.

Related papers

From Pixels to Histopathology: A Graph-Based Framework for Interpretable Whole Slide Image Analysis [81.19923502845441]
We develop a graph-based framework that constructs WSI graph representations. We build tissue representations (nodes) that follow biological boundaries rather than arbitrary patches. In our method's final step, we solve the diagnostic task through a graph attention network.
arXiv Detail & Related papers (2025-03-14T20:15:04Z)
On the Importance of Text Preprocessing for Multimodal Representation Learning and Pathology Report Generation [0.7966328552094392]
Vision-language models in pathology enable multimodal case retrieval and automated report generation. Many of the models developed so far have been trained on pathology reports that include information which cannot be inferred from paired whole slide images. We investigate how the selection of information from pathology reports for vision-language modeling affects the quality of the multimodal representations and generated reports.
arXiv Detail & Related papers (2025-02-26T16:45:09Z)
Clinical-grade Multi-Organ Pathology Report Generation for Multi-scale Whole Slide Images via a Semantically Guided Medical Text Foundation Model [3.356716093747221]
We propose a novel Patient-level Multi-organ Pathology Report Generation (PMPRG) model to generate pathology reports for patients. Our model achieved a METEOR score of 0.68, demonstrating the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-23T22:22:32Z)
PathAlign: A vision-language model for whole slide images in histopathology [13.567674461880905]
We develop a vision-language model based on the BLIP-2 framework using WSIs and curated text from pathology reports. This enables applications utilizing a shared image-text embedding space, such as text or image retrieval for finding cases of interest. We present pathologist evaluation of text generation and text retrieval using WSI embeddings, as well as results for WSI classification and workflow prioritization.
arXiv Detail & Related papers (2024-06-27T23:43:36Z)
PLUTO: Pathology-Universal Transformer [4.920983796208486]
We propose PathoLogy Universal TransfOrmer (PLUTO): a light-weight pathology FM that is pre-trained on a diverse dataset of 195 million image tiles. We design task-specific adaptation heads that utilize PLUTO's output embeddings for tasks which span pathology scales. We find that PLUTO matches or outperforms existing task-specific baselines and pathology-specific foundation models.
arXiv Detail & Related papers (2024-05-13T16:40:17Z)
HistGen: Histopathology Report Generation via Local-Global Feature Encoding and Cross-modal Context Interaction [16.060286162384536]
HistGen is a learning-empowered framework for histopathology report generation. It aims to boost report generation by aligning whole slide images (WSIs) and diagnostic reports from local and global granularity. Experimental results on WSI report generation show the proposed model outperforms state-of-the-art (SOTA) models by a large margin.
arXiv Detail & Related papers (2024-03-08T15:51:43Z)
A self-supervised framework for learning whole slide representations [52.774822784847565]
We present Slide Pre-trained Transformers (SPT) for gigapixel-scale self-supervision of whole slide images. We benchmark SPT visual representations on five diagnostic tasks across three biomedical microscopy datasets.
arXiv Detail & Related papers (2024-02-09T05:05:28Z)
PathLDM: Text conditioned Latent Diffusion Model for Histopathology [62.970593674481414]
We introduce PathLDM, the first text-conditioned Latent Diffusion Model tailored for generating high-quality histopathology images. Our approach fuses image and textual data to enhance the generation process. We achieved a SoTA FID score of 7.64 for text-to-image generation on the TCGA-BRCA dataset, significantly outperforming the closest text-conditioned competitor with FID 30.1.
arXiv Detail & Related papers (2023-09-01T22:08:32Z)
Vision-Language Modelling For Radiological Imaging and Reports In The Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space. We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains. Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z)
AMIGO: Sparse Multi-Modal Graph Transformer with Shared-Context Processing for Representation Learning of Giga-pixel Images [53.29794593104923]
We present a novel concept of shared-context processing for whole slide histopathology images. AMIGO uses the celluar graph within the tissue to provide a single representation for a patient. We show that our model is strongly robust to missing information to an extent that it can achieve the same performance with as low as 20% of the data.
arXiv Detail & Related papers (2023-03-01T23:37:45Z)
Hierarchical Transformer for Survival Prediction Using Multimodality Whole Slide Images and Genomics [63.76637479503006]
Learning good representation of giga-pixel level whole slide pathology images (WSI) for downstream tasks is critical. This paper proposes a hierarchical-based multimodal transformer framework that learns a hierarchical mapping between pathology images and corresponding genes. Our architecture requires fewer GPU resources compared with benchmark methods while maintaining better WSI representation ability.
arXiv Detail & Related papers (2022-11-29T23:47:56Z)
G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers. We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z)
Interpretable and synergistic deep learning for visual explanation and statistical estimations of segmentation of disease features from medical images [0.0]
Deep learning (DL) models for disease classification or segmentation from medical images are increasingly trained using transfer learning (TL) from unrelated natural world images. We report detailed comparisons, rigorous statistical analysis and comparisons of widely used DL architecture for binary segmentation after TL. A free GitHub repository of TII and LMI models, code and more than 10,000 medical images and their Grad-CAM output from this study can be used as starting points for advanced computational medicine.
arXiv Detail & Related papers (2020-11-11T14:08:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.