Related papers: HARE: an entity and relation centric evaluation framework for histopathology reports

HARE: an entity and relation centric evaluation framework for histopathology reports

URL: http://arxiv.org/abs/2509.16326v1
Date: Fri, 19 Sep 2025 18:12:19 GMT
Title: HARE: an entity and relation centric evaluation framework for histopathology reports
Authors: Yunsoo Kim, Michal W. S. Ong, Alex Shavick, Honghan Wu, Adam P. Levine,
Abstract summary: We propose HARE (Histopathology Automated Report Evaluation), a novel entity and relation centric framework.<n>HarE prioritizes clinically relevant content by aligning critical histopathology entities and relations between reference and generated reports.<n>We fine-tuned GatorTronS, a domain-adapted language model to develop HARE-NER and HARE-RE which achieved the highest overall F1-score (0.915) among the tested models.
Score: 12.209068071559829
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Medical domain automated text generation is an active area of research and development; however, evaluating the clinical quality of generated reports remains a challenge, especially in instances where domain-specific metrics are lacking, e.g. histopathology. We propose HARE (Histopathology Automated Report Evaluation), a novel entity and relation centric framework, composed of a benchmark dataset, a named entity recognition (NER) model, a relation extraction (RE) model, and a novel metric, which prioritizes clinically relevant content by aligning critical histopathology entities and relations between reference and generated reports. To develop the HARE benchmark, we annotated 813 de-identified clinical diagnostic histopathology reports and 652 histopathology reports from The Cancer Genome Atlas (TCGA) with domain-specific entities and relations. We fine-tuned GatorTronS, a domain-adapted language model to develop HARE-NER and HARE-RE which achieved the highest overall F1-score (0.915) among the tested models. The proposed HARE metric outperformed traditional metrics including ROUGE and Meteor, as well as radiology metrics such as RadGraph-XL, with the highest correlation and the best regression to expert evaluations (higher than the second best method, GREEN, a large language model based radiology report evaluator, by Pearson $r = 0.168$, Spearman $\rho = 0.161$, Kendall $\tau = 0.123$, $R^2 = 0.176$, $RMSE = 0.018$). We release HARE, datasets, and the models at https://github.com/knowlab/HARE to foster advancements in histopathology report generation, providing a robust framework for improving the quality of reports.

Related papers

CTest-Metric: A Unified Framework to Assess Clinical Validity of Metrics for CT Report Generation [8.08950963137043]
We present CTest-Metric, a first unified metric assessment framework with three modules determining the clinical feasibility of metrics for CT RRG.<n>The modules test: (i) Writing Style Generalizability (WSG) via LLM-based rephrasing; (ii) Synthetic Error Injection (SEI) at graded severities; and (iii) Metrics-vs-Expert correlation (MvE) using clinician ratings on 175 "disagreement" cases.<n>Eight widely used metrics (BLEU, ROUGE, METEOR, BERTScore-F1, F1-RadGraph, Ra
arXiv Detail & Related papers (2026-01-16T18:09:19Z)
AMRG: Extend Vision Language Models for Automatic Mammography Report Generation [4.366802575084445]
Mammography report generation is a critical yet underexplored task in medical AI.<n>We introduce AMRG, the first end-to-end framework for generating narrative mammography reports.<n>We train and evaluate AMRG on DMID, a publicly available dataset of paired high-resolution mammograms and diagnostic reports.
arXiv Detail & Related papers (2025-08-12T06:37:41Z)
Medical-GAT: Cancer Document Classification Leveraging Graph-Based Residual Network for Scenarios with Limited Data [1.0485739694839669]
We present a curated dataset of 1,874 biomedical abstracts, categorized into thyroid cancer, colon cancer, lung cancer, and generic topics.<n>Our research focuses on leveraging this dataset to improve classification performance, particularly in data-scarce scenarios.<n>We introduce a Residual Graph Attention Network (R-GAT) with multiple graph attention layers that capture the semantic information and structural relationships within cancer-related documents.
arXiv Detail & Related papers (2024-10-19T20:07:40Z)
MGH Radiology Llama: A Llama 3 70B Model for Radiology [50.42811030970618]
This paper presents an advanced radiology-focused large language model: MGH Radiology Llama.<n>It is developed using the Llama 3 70B model, building upon previous domain-specific models like Radiology-GPT and Radiology-Llama2.<n>Our evaluation, incorporating both traditional metrics and a GPT-4-based assessment, highlights the enhanced performance of this work over general-purpose LLMs.
arXiv Detail & Related papers (2024-08-13T01:30:03Z)
RaTEScore: A Metric for Radiology Report Generation [59.37561810438641]
This paper introduces a novel, entity-aware metric, as Radiological Report (Text) Evaluation (RaTEScore) RaTEScore emphasizes crucial medical entities such as diagnostic outcomes and anatomical details, and is robust against complex medical synonyms and sensitive to negation expressions. Our evaluations demonstrate that RaTEScore aligns more closely with human preference than existing metrics, validated both on established public benchmarks and our newly proposed RaTE-Eval benchmark.
arXiv Detail & Related papers (2024-06-24T17:49:28Z)
MRScore: Evaluating Radiology Report Generation with LLM-based Reward System [39.54237580336297]
This paper introduces MRScore, an automatic evaluation metric tailored for radiology report generation by leveraging Large Language Models (LLMs) To address this challenge, we collaborated with radiologists to develop a framework that guides LLMs for radiology report evaluation, ensuring alignment with human analysis. Our experiments demonstrate MRScore's higher correlation with human judgments and superior performance in model selection compared to traditional metrics.
arXiv Detail & Related papers (2024-04-27T04:42:45Z)
HistGen: Histopathology Report Generation via Local-Global Feature Encoding and Cross-modal Context Interaction [16.060286162384536]
HistGen is a learning-empowered framework for histopathology report generation. It aims to boost report generation by aligning whole slide images (WSIs) and diagnostic reports from local and global granularity. Experimental results on WSI report generation show the proposed model outperforms state-of-the-art (SOTA) models by a large margin.
arXiv Detail & Related papers (2024-03-08T15:51:43Z)
ChatRadio-Valuer: A Chat Large Language Model for Generalizable Radiology Report Generation Based on Multi-institution and Multi-system Data [115.0747462486285]
ChatRadio-Valuer is a tailored model for automatic radiology report generation that learns generalizable representations. The clinical dataset utilized in this study encompasses a remarkable total of textbf332,673 observations. ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al.
arXiv Detail & Related papers (2023-10-08T17:23:17Z)
PathLDM: Text conditioned Latent Diffusion Model for Histopathology [62.970593674481414]
We introduce PathLDM, the first text-conditioned Latent Diffusion Model tailored for generating high-quality histopathology images. Our approach fuses image and textual data to enhance the generation process. We achieved a SoTA FID score of 7.64 for text-to-image generation on the TCGA-BRCA dataset, significantly outperforming the closest text-conditioned competitor with FID 30.1.
arXiv Detail & Related papers (2023-09-01T22:08:32Z)
Radiology-Llama2: Best-in-Class Large Language Model for Radiology [71.27700230067168]
This paper introduces Radiology-Llama2, a large language model specialized for radiology through a process known as instruction tuning. Quantitative evaluations using ROUGE metrics on the MIMIC-CXR and OpenI datasets demonstrate that Radiology-Llama2 achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-08-29T17:44:28Z)
Auxiliary Signal-Guided Knowledge Encoder-Decoder for Medical Report Generation [107.3538598876467]
We propose an Auxiliary Signal-Guided Knowledge-Decoder (ASGK) to mimic radiologists' working patterns. ASGK integrates internal visual feature fusion and external medical linguistic information to guide medical knowledge transfer and learning.
arXiv Detail & Related papers (2020-06-06T01:00:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.