Use of natural language processing to extract and classify papillary thyroid cancer features from surgical pathology reports
- URL: http://arxiv.org/abs/2406.00015v1
- Date: Wed, 22 May 2024 22:27:12 GMT
- Title: Use of natural language processing to extract and classify papillary thyroid cancer features from surgical pathology reports
- Authors: Ricardo Loor-Torres, Yuqi Wu, Esteban Cabezas, Mariana Borras, David Toro-Tobon, Mayra Duran, Misk Al Zahidy, Maria Mateo Chavez, Cristian Soto Jacome, Jungwei W. Fan, Naykky M. Singh Ospina, Yonghui Wu, Juan P. Brito,
- Abstract summary: We analyzed 1,410 surgical pathology reports from adult papillary thyroid cancer patients at Mayo Clinic, Rochester, MN.
We developed ThyroPath, a rule-based NLP pipeline, to extract and classify thyroid cancer features into risk categories.
- Score: 9.200141008020484
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Background We aim to use Natural Language Processing (NLP) to automate the extraction and classification of thyroid cancer risk factors from pathology reports. Methods We analyzed 1,410 surgical pathology reports from adult papillary thyroid cancer patients at Mayo Clinic, Rochester, MN, from 2010 to 2019. Structured and non-structured reports were used to create a consensus-based ground truth dictionary and categorized them into modified recurrence risk levels. Non-structured reports were narrative, while structured reports followed standardized formats. We then developed ThyroPath, a rule-based NLP pipeline, to extract and classify thyroid cancer features into risk categories. Training involved 225 reports (150 structured, 75 unstructured), with testing on 170 reports (120 structured, 50 unstructured) for evaluation. The pipeline's performance was assessed using both strict and lenient criteria for accuracy, precision, recall, and F1-score. Results In extraction tasks, ThyroPath achieved overall strict F-1 scores of 93% for structured reports and 90 for unstructured reports, covering 18 thyroid cancer pathology features. In classification tasks, ThyroPath-extracted information demonstrated an overall accuracy of 93% in categorizing reports based on their corresponding guideline-based risk of recurrence: 76.9% for high-risk, 86.8% for intermediate risk, and 100% for both low and very low-risk cases. However, ThyroPath achieved 100% accuracy across all thyroid cancer risk categories with human-extracted pathology information. Conclusions ThyroPath shows promise in automating the extraction and risk recurrence classification of thyroid pathology reports at large scale. It offers a solution to laborious manual reviews and advancing virtual registries. However, it requires further validation before implementation.
Related papers
- Pillar-0: A New Frontier for Radiology Foundation Models [41.640120966890954]
We introduce Pillar-0, a radiology foundation model pretrained on 42,990 abdomen-pelvis CTs, 86,411 chest CTs, 14,348 head CTs, and 11,543 breast MRIs.<n>Pillar-0 achieves mean AUROCs of 86.4, 88.0, 90.1, and 82.9, outperforming MedGemma (Google), MedImageInsight (Microsoft), Lingshu (Alibaba), and Merlin (Stanford) by 7.8-15.8 AUROC points and ranking best in 87.2% (319/366) tasks.
arXiv Detail & Related papers (2025-11-21T21:50:34Z) - Closing the Performance Gap Between AI and Radiologists in Chest X-Ray Reporting [40.40577855417923]
We introduce MAIRA-X, a clinically evaluated multimodal AI model for longitudinal chest X-ray report generation.<n>A novel L&T-specific metrics framework was developed to assess accuracy in reporting attributes such as type, longitudinal change and placement.<n>Our results suggest MAIRA-X can effectively assist radiologists, particularly in high-volume clinical settings.
arXiv Detail & Related papers (2025-11-21T10:53:26Z) - An Explainable Hybrid AI Framework for Enhanced Tuberculosis and Symptom Detection [55.35661671061754]
Tuberculosis remains a critical global health issue, particularly in resource-limited and remote areas.<n>We propose a framework which enhances disease and symptom detection on chest X-rays by integrating two supervised heads and a self-supervised head.<n>Our model achieves an accuracy of 98.85% for distinguishing between COVID-19, tuberculosis, and normal cases, and a macro-F1 score of 90.09% for multilabel symptom detection.
arXiv Detail & Related papers (2025-10-21T17:18:55Z) - PathOrchestra: A Comprehensive Foundation Model for Computational Pathology with Over 100 Diverse Clinical-Grade Tasks [39.97710183184273]
We present PathOrchestra, a versatile pathology foundation model trained via self-supervised learning on a dataset comprising 300K pathological slides.<n>The model was rigorously evaluated on 112 clinical tasks using a combination of 61 private and 51 public datasets.<n>PathOrchestra demonstrated exceptional performance across 27,755 WSIs and 9,415,729 ROIs, achieving over 0.950 accuracy in 47 tasks.
arXiv Detail & Related papers (2025-03-31T17:28:02Z) - A Knowledge-enhanced Pathology Vision-language Foundation Model for Cancer Diagnosis [58.85247337449624]
We propose a knowledge-enhanced vision-language pre-training approach that integrates disease knowledge into the alignment within hierarchical semantic groups.
KEEP achieves state-of-the-art performance in zero-shot cancer diagnostic tasks.
arXiv Detail & Related papers (2024-12-17T17:45:21Z) - TopoTxR: A topology-guided deep convolutional network for breast parenchyma learning on DCE-MRIs [49.69047720285225]
We propose a novel topological approach that explicitly extracts multi-scale topological structures to better approximate breast parenchymal structures.
We empirically validate emphTopoTxR using the VICTRE phantom breast dataset.
Our qualitative and quantitative analyses suggest differential topological behavior of breast tissue in treatment-na"ive imaging.
arXiv Detail & Related papers (2024-11-05T19:35:10Z) - Predicting the risk of early-stage breast cancer recurrence using H\&E-stained tissue images [5.507561997194002]
We investigated whether deep learning algorithms can predict patients' risk of recurrence by analyzing the pathology images of their cancer histology.
We obtained sensitivity of 0.857, 0.746, and 0.529 for predicting low, intermediate, and high risk, and specificity of 0.816, 0.803, and 0.972.
When we checked the model learned through these studies through the class activation map, we found that it actually considered tubule formation and mitotic rate when predicting different risk groups.
arXiv Detail & Related papers (2024-06-10T08:51:59Z) - Analysis of the 2024 BraTS Meningioma Radiotherapy Planning Automated Segmentation Challenge [45.3253187215396]
The 2024 Brain Tumor Meningioma Radiotherapy (BraTS-MEN-RT) challenge aimed to advance automated segmentation algorithms.<n>We describe the design and results from the BraTS-MEN-RT challenge.
arXiv Detail & Related papers (2024-05-28T17:25:43Z) - ChatRadio-Valuer: A Chat Large Language Model for Generalizable
Radiology Report Generation Based on Multi-institution and Multi-system Data [115.0747462486285]
ChatRadio-Valuer is a tailored model for automatic radiology report generation that learns generalizable representations.
The clinical dataset utilized in this study encompasses a remarkable total of textbf332,673 observations.
ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al.
arXiv Detail & Related papers (2023-10-08T17:23:17Z) - hist2RNA: An efficient deep learning architecture to predict gene
expression from breast cancer histopathology images [11.822321981275232]
Deep learning algorithms can effectively extract morphological patterns in digital histopathology images to predict molecular phenotypes quickly and cost-effectively.
We propose a new, computationally efficient approach called hist2RNA inspired by bulk RNA-sequencing techniques to predict the expression of 138 genes.
arXiv Detail & Related papers (2023-04-10T10:54:32Z) - Extracting Thyroid Nodules Characteristics from Ultrasound Reports Using
Transformer-based Natural Language Processing Methods [22.35979441935564]
The characteristics of thyroid nodules are often documented in clinical narratives such as ultrasound reports.
To the best of our knowledge, this is the first study to systematically categorize and apply transformer-based NLP models to extract a large number of clinical relevant thyroid nodule characteristics from ultrasound reports.
arXiv Detail & Related papers (2023-03-31T20:23:58Z) - Exploiting segmentation labels and representation learning to forecast
therapy response of PDAC patients [60.78505216352878]
We propose a hybrid deep neural network pipeline to predict tumour response to initial chemotherapy.
We leverage a combination of representation transfer from segmentation to classification, as well as localisation and representation learning.
Our approach yields a remarkably data-efficient method able to predict treatment response with a ROC-AUC of 63.7% using only 477 datasets in total.
arXiv Detail & Related papers (2022-11-08T11:50:31Z) - A Pathologist-Informed Workflow for Classification of Prostate Glands in
Histopathology [62.997667081978825]
Pathologists diagnose and grade prostate cancer by examining tissue from needle biopsies on glass slides.
Cancer's severity and risk of metastasis are determined by the Gleason grade, a score based on the organization and morphology of prostate cancer glands.
This paper proposes an automated workflow that follows pathologists' textitmodus operandi, isolating and classifying multi-scale patches of individual glands.
arXiv Detail & Related papers (2022-09-27T14:08:19Z) - WSSS4LUAD: Grand Challenge on Weakly-supervised Tissue Semantic
Segmentation for Lung Adenocarcinoma [51.50991881342181]
This challenge includes 10,091 patch-level annotations and over 130 million labeled pixels.
First place team achieved mIoU of 0.8413 (tumor: 0.8389, stroma: 0.7931, normal: 0.8919)
arXiv Detail & Related papers (2022-04-13T15:27:05Z) - Event-based clinical findings extraction from radiology reports with
pre-trained language model [0.22940141855172028]
We present a new corpus of radiology reports annotated with clinical findings.
The gold standard corpus contained a total of 500 annotated computed tomography (CT) reports.
We extracted triggers and argument entities using two state-of-the-art deep learning architectures, including BERT.
arXiv Detail & Related papers (2021-12-27T05:03:10Z) - Cancer Gene Profiling through Unsupervised Discovery [49.28556294619424]
We introduce a novel, automatic and unsupervised framework to discover low-dimensional gene biomarkers.
Our method is based on the LP-Stability algorithm, a high dimensional center-based unsupervised clustering algorithm.
Our signature reports promising results on distinguishing immune inflammatory and immune desert tumors.
arXiv Detail & Related papers (2021-02-11T09:04:45Z) - A Natural Language Processing Pipeline of Chinese Free-text Radiology
Reports for Liver Cancer Diagnosis [8.549162626766332]
This study designed an NLP pipeline for the direct extraction of clinically relevant features from Chinese radiology reports.
The pipeline was comprised of named entity recognition, synonyms normalization, and relationship extraction.
For liver cancer diagnosis, random forest had the highest predictive performance in liver cancer diagnosis.
arXiv Detail & Related papers (2020-04-10T09:32:07Z) - Segmentation for Classification of Screening Pancreatic Neuroendocrine
Tumors [72.65802386845002]
This work presents comprehensive results to detect in the early stage the pancreatic neuroendocrine tumors (PNETs) in abdominal CT scans.
To the best of our knowledge, this task has not been studied before as a computational task.
Our approach outperforms state-of-the-art segmentation networks and achieves a sensitivity of $89.47%$ at a specificity of $81.08%$.
arXiv Detail & Related papers (2020-04-04T21:21:44Z) - Automated Detection of Cribriform Growth Patterns in Prostate Histology
Images [0.13048920509133805]
Cribriform growth patterns in prostate carcinoma are associated with poor prognosis.
convolutional neural network was trained to detect cribriform growth patterns on 128 prostate needle biopsies.
arXiv Detail & Related papers (2020-03-23T20:56:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.