Related papers: HistGen: Histopathology Report Generation via Local-Global Feature Encoding and Cross-modal Context Interaction

HistGen: Histopathology Report Generation via Local-Global Feature Encoding and Cross-modal Context Interaction

URL: http://arxiv.org/abs/2403.05396v2
Date: Tue, 18 Jun 2024 05:58:43 GMT
Title: HistGen: Histopathology Report Generation via Local-Global Feature Encoding and Cross-modal Context Interaction
Authors: Zhengrui Guo, Jiabo Ma, Yingxue Xu, Yihui Wang, Liansheng Wang, Hao Chen,
Abstract summary: HistGen is a learning-empowered framework for histopathology report generation. It aims to boost report generation by aligning whole slide images (WSIs) and diagnostic reports from local and global granularity. Experimental results on WSI report generation show the proposed model outperforms state-of-the-art (SOTA) models by a large margin.
Score: 16.060286162384536
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Histopathology serves as the gold standard in cancer diagnosis, with clinical reports being vital in interpreting and understanding this process, guiding cancer treatment and patient care. The automation of histopathology report generation with deep learning stands to significantly enhance clinical efficiency and lessen the labor-intensive, time-consuming burden on pathologists in report writing. In pursuit of this advancement, we introduce HistGen, a multiple instance learning-empowered framework for histopathology report generation together with the first benchmark dataset for evaluation. Inspired by diagnostic and report-writing workflows, HistGen features two delicately designed modules, aiming to boost report generation by aligning whole slide images (WSIs) and diagnostic reports from local and global granularity. To achieve this, a local-global hierarchical encoder is developed for efficient visual feature aggregation from a region-to-slide perspective. Meanwhile, a cross-modal context module is proposed to explicitly facilitate alignment and interaction between distinct modalities, effectively bridging the gap between the extensive visual sequences of WSIs and corresponding highly summarized reports. Experimental results on WSI report generation show the proposed model outperforms state-of-the-art (SOTA) models by a large margin. Moreover, the results of fine-tuning our model on cancer subtyping and survival analysis tasks further demonstrate superior performance compared to SOTA methods, showcasing strong transfer learning capability. Dataset, model weights, and source code are available in https://github.com/dddavid4real/HistGen.

Related papers

Self-Supervised Anatomical Consistency Learning for Vision-Grounded Medical Report Generation [61.350584471060756]
Vision-grounded medical report generation aims to produce clinically accurate descriptions of medical images.<n>We propose Self-Supervised Anatomical Consistency Learning (SS-ACL) to align generated reports with corresponding anatomical regions.<n>SS-ACL constructs a hierarchical anatomical graph inspired by the invariant top-down inclusion structure of human anatomy.
arXiv Detail & Related papers (2025-09-30T08:59:06Z)
AMRG: Extend Vision Language Models for Automatic Mammography Report Generation [4.366802575084445]
Mammography report generation is a critical yet underexplored task in medical AI.<n>We introduce AMRG, the first end-to-end framework for generating narrative mammography reports.<n>We train and evaluate AMRG on DMID, a publicly available dataset of paired high-resolution mammograms and diagnostic reports.
arXiv Detail & Related papers (2025-08-12T06:37:41Z)
Histopathology Image Report Generation by Vision Language Model with Multimodal In-Context Learning [27.49826980862286]
We propose an in-context learning framework called PathGenIC that integrates context derived from the training set with a multimodal in-context learning mechanism.<n>Our method dynamically retrieves semantically similar whole slide representations (WSI)-report pairs and incorporates adaptive feedback to enhance contextual relevance and generation quality.
arXiv Detail & Related papers (2025-06-21T08:56:45Z)
Leveraging Vision-Language Embeddings for Zero-Shot Learning in Histopathology Images [7.048241543461529]
We propose a novel framework called Multi-Resolution Prompt-guided Hybrid Embedding (MR-PHE) to address these challenges in zero-shot histopathology image classification. We introduce a hybrid embedding strategy that integrates global image embeddings with weighted patch embeddings. A similarity-based patch weighting mechanism assigns attention-like weights to patches based on their relevance to class embeddings.
arXiv Detail & Related papers (2025-03-13T12:18:37Z)
Activating Associative Disease-Aware Vision Token Memory for LLM-Based X-ray Report Generation [54.631356899598956]
We propose a novel associative memory-enhanced X-ray report generation model that effectively mimics the process of professional doctors writing medical reports. We employ a visual Hopfield network to establish memory associations for disease-related tokens, and a report Hopfield network to retrieve report memory information.
arXiv Detail & Related papers (2025-01-07T01:19:48Z)
HC-LLM: Historical-Constrained Large Language Models for Radiology Report Generation [89.3260120072177]
We propose a novel Historical-Constrained Large Language Models (HC-LLM) framework for Radiology report generation.<n>Our approach extracts both time-shared and time-specific features from longitudinal chest X-rays and diagnostic reports to capture disease progression.<n> Notably, our approach performs well even without historical data during testing and can be easily adapted to other multimodal large models.
arXiv Detail & Related papers (2024-12-15T06:04:16Z)
Clinical-grade Multi-Organ Pathology Report Generation for Multi-scale Whole Slide Images via a Semantically Guided Medical Text Foundation Model [3.356716093747221]
We propose a novel Patient-level Multi-organ Pathology Report Generation (PMPRG) model to generate pathology reports for patients. Our model achieved a METEOR score of 0.68, demonstrating the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-23T22:22:32Z)
Multi-modality Regional Alignment Network for Covid X-Ray Survival Prediction and Report Generation [36.343753593390254]
This study proposes Multi-modality Regional Alignment Network (MRANet), an explainable model for radiology report generation and survival prediction. MRANet visually grounds region-specific descriptions, providing robust anatomical regions with a completion strategy. A cross LLMs alignment is employed to enhance the image-to-text transfer process, resulting in sentences rich with clinical detail and improved explainability for radiologist.
arXiv Detail & Related papers (2024-05-23T02:41:08Z)
Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning [65.54680361074882]
Eye-gaze Guided Multi-modal Alignment (EGMA) framework harnesses eye-gaze data for better alignment of medical visual and textual features. We conduct downstream tasks of image classification and image-text retrieval on four medical datasets.
arXiv Detail & Related papers (2024-03-19T03:59:14Z)
WsiCaption: Multiple Instance Generation of Pathology Reports for Gigapixel Whole-Slide Images [5.960501267687475]
We investigate how to generate pathology reports given whole slide images. We curated the largest WSI-text dataset (PathText) On the model end, we propose the multiple instance generative model (MI-Gen)
arXiv Detail & Related papers (2023-11-27T05:05:41Z)
Radiology Report Generation Using Transformers Conditioned with Non-imaging Data [55.17268696112258]
This paper proposes a novel multi-modal transformer network that integrates chest x-ray (CXR) images and associated patient demographic information. The proposed network uses a convolutional neural network to extract visual features from CXRs and a transformer-based encoder-decoder network that combines the visual features with semantic text embeddings of patient demographic information.
arXiv Detail & Related papers (2023-11-18T14:52:26Z)
PathLDM: Text conditioned Latent Diffusion Model for Histopathology [62.970593674481414]
We introduce PathLDM, the first text-conditioned Latent Diffusion Model tailored for generating high-quality histopathology images. Our approach fuses image and textual data to enhance the generation process. We achieved a SoTA FID score of 7.64 for text-to-image generation on the TCGA-BRCA dataset, significantly outperforming the closest text-conditioned competitor with FID 30.1.
arXiv Detail & Related papers (2023-09-01T22:08:32Z)
Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report Generation [92.73584302508907]
We propose a knowledge graph with Dynamic structure and nodes to facilitate medical report generation with Contrastive Learning. In detail, the fundamental structure of our graph is pre-constructed from general knowledge. Each image feature is integrated with its very own updated graph before being fed into the decoder module for report generation.
arXiv Detail & Related papers (2023-03-18T03:53:43Z)
Cross-Modal Causal Intervention for Medical Report Generation [109.83549148448469]
Medical report generation (MRG) is essential for computer-aided diagnosis and medication guidance. Due to the spurious correlations within image-text data induced by visual and linguistic biases, it is challenging to generate accurate reports reliably describing lesion areas. We propose a novel Visual-Linguistic Causal Intervention (VLCI) framework for MRG, which consists of a visual deconfounding module (VDM) and a linguistic deconfounding module (LDM)
arXiv Detail & Related papers (2023-03-16T07:23:55Z)
Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation [116.87918100031153]
We propose a Cross-modal clinical Graph Transformer (CGT) for ophthalmic report generation (ORG) CGT injects clinical relation triples into the visual features as prior knowledge to drive the decoding procedure. Experiments on the large-scale FFA-IR benchmark demonstrate that the proposed CGT is able to outperform previous benchmark methods.
arXiv Detail & Related papers (2022-06-04T13:16:30Z)
Cross-modal Memory Networks for Radiology Report Generation [30.13916304931662]
Cross-modal memory networks (CMN) are proposed to enhance the encoder-decoder framework for radiology report generation. Our model is able to better align information from radiology images and texts so as to help generating more accurate reports in terms of clinical indicators.
arXiv Detail & Related papers (2022-04-28T02:32:53Z)
Radiology Report Generation with a Learned Knowledge Base and Multi-modal Alignment [27.111857943935725]
We present an automatic, multi-modal approach for report generation from chest x-ray. Our approach features two distinct modules: (i) Learned knowledge base and (ii) Multi-modal alignment. With the aid of both modules, our approach clearly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2021-12-30T10:43:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.