Classification of cancer pathology reports: a large-scale comparative
study
- URL: http://arxiv.org/abs/2006.16370v1
- Date: Mon, 29 Jun 2020 20:47:33 GMT
- Title: Classification of cancer pathology reports: a large-scale comparative
study
- Authors: Stefano Martina, Leonardo Ventura, Paolo Frasconi
- Abstract summary: We apply state-of-the-art deep learning techniques to the automatic assignment of ICD-O3 topography and morphology codes to free-text cancer reports.
We present results on a large dataset (more than 80 000 labeled and 1 500 000 unlabeled anonymized reports written in Italian and collected from hospitals in Tuscany over more than a decade) and with a large number of classes.
- Score: 8.211700929845689
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We report about the application of state-of-the-art deep learning techniques
to the automatic and interpretable assignment of ICD-O3 topography and
morphology codes to free-text cancer reports. We present results on a large
dataset (more than 80 000 labeled and 1 500 000 unlabeled anonymized reports
written in Italian and collected from hospitals in Tuscany over more than a
decade) and with a large number of classes (134 morphological classes and 61
topographical classes). We compare alternative architectures in terms of
prediction accuracy and interpretability and show that our best model achieves
a multiclass accuracy of 90.3% on topography site assignment and 84.8% on
morphology type assignment. We found that in this context hierarchical models
are not better than flat models and that an element-wise maximum aggregator is
slightly better than attentive models on site classification. Moreover, the
maximum aggregator offers a way to interpret the classification process.
Related papers
- A Knowledge-enhanced Pathology Vision-language Foundation Model for Cancer Diagnosis [58.85247337449624]
We propose a knowledge-enhanced vision-language pre-training approach that integrates disease knowledge into the alignment within hierarchical semantic groups.
KEEP achieves state-of-the-art performance in zero-shot cancer diagnostic tasks.
arXiv Detail & Related papers (2024-12-17T17:45:21Z) - Comparative Analysis and Ensemble Enhancement of Leading CNN Architectures for Breast Cancer Classification [0.0]
This study introduces a novel and accurate approach to breast cancer classification using histopathology images.
It systematically compares leading Convolutional Neural Network (CNN) models across varying image datasets.
Our findings establish the settings required to achieve exceptional classification accuracy for standalone CNN models.
arXiv Detail & Related papers (2024-10-04T11:31:43Z) - LoGra-Med: Long Context Multi-Graph Alignment for Medical Vision-Language Model [55.80651780294357]
State-of-the-art medical multi-modal large language models (med-MLLM) leverage instruction-following data in pre-training.
LoGra-Med is a new multi-graph alignment algorithm that enforces triplet correlations across image modalities, conversation-based descriptions, and extended captions.
Our results show LoGra-Med matches LLAVA-Med performance on 600K image-text pairs for Medical VQA and significantly outperforms it when trained on 10% of the data.
arXiv Detail & Related papers (2024-10-03T15:52:03Z) - Towards a Benchmark for Colorectal Cancer Segmentation in Endorectal Ultrasound Videos: Dataset and Model Development [59.74920439478643]
In this paper, we collect and annotated the first benchmark dataset that covers diverse ERUS scenarios.
Our ERUS-10K dataset comprises 77 videos and 10,000 high-resolution annotated frames.
We introduce a benchmark model for colorectal cancer segmentation, named the Adaptive Sparse-context TRansformer (ASTR)
arXiv Detail & Related papers (2024-08-19T15:04:42Z) - Language Models for Text Classification: Is In-Context Learning Enough? [54.869097980761595]
Recent foundational language models have shown state-of-the-art performance in many NLP tasks in zero- and few-shot settings.
An advantage of these models over more standard approaches is the ability to understand instructions written in natural language (prompts)
This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances.
arXiv Detail & Related papers (2024-03-26T12:47:39Z) - One Model to Rule them All: Towards Universal Segmentation for Medical Images with Text Prompts [62.55349777609194]
We aim to build up a model that can Segment Anything in radiology scans, driven by Text prompts, termed as SAT.
We build up the largest and most comprehensive segmentation dataset for training, by collecting over 22K 3D medical image scans.
We have trained SAT-Nano (110M parameters) and SAT-Pro (447M parameters) demonstrating comparable performance to 72 specialist nnU-Nets trained on each dataset/subsets.
arXiv Detail & Related papers (2023-12-28T18:16:00Z) - PathLDM: Text conditioned Latent Diffusion Model for Histopathology [62.970593674481414]
We introduce PathLDM, the first text-conditioned Latent Diffusion Model tailored for generating high-quality histopathology images.
Our approach fuses image and textual data to enhance the generation process.
We achieved a SoTA FID score of 7.64 for text-to-image generation on the TCGA-BRCA dataset, significantly outperforming the closest text-conditioned competitor with FID 30.1.
arXiv Detail & Related papers (2023-09-01T22:08:32Z) - Cancer Gene Profiling through Unsupervised Discovery [49.28556294619424]
We introduce a novel, automatic and unsupervised framework to discover low-dimensional gene biomarkers.
Our method is based on the LP-Stability algorithm, a high dimensional center-based unsupervised clustering algorithm.
Our signature reports promising results on distinguishing immune inflammatory and immune desert tumors.
arXiv Detail & Related papers (2021-02-11T09:04:45Z) - Convolutional Neural Networks in Multi-Class Classification of Medical
Data [0.9137554315375922]
We introduce an ensemble model that consists of both deep learning (CNN) and shallow learning models (Gradient Boosting)
The method achieves Accuracy of 64.93, the highest three-class classification accuracy we achieved in this study.
arXiv Detail & Related papers (2020-12-28T02:04:38Z) - A Multi-resolution Model for Histopathology Image Classification and
Localization with Multiple Instance Learning [9.36505887990307]
We propose a multi-resolution multiple instance learning model that leverages saliency maps to detect suspicious regions for fine-grained grade prediction.
The model is developed on a large-scale prostate biopsy dataset containing 20,229 slides from 830 patients.
The model achieved 92.7% accuracy, 81.8% Cohen's Kappa for benign, low grade (i.e. Grade group 1) and high grade (i.e. Grade group >= 2) prediction, an area under the receiver operating characteristic curve (AUROC) of 98.2% and an average precision (AP) of 97.4%.
arXiv Detail & Related papers (2020-11-05T06:42:39Z) - Hierarchical Deep Learning Classification of Unstructured Pathology
Reports to Automate ICD-O Morphology Grading [0.0]
We present a hierarchical deep learning classification method that employs convolutional neural network models to automate the classification of 1813 breast cancer pathology reports.
We demonstrate that the hierarchical deep learning classification method improves on performance in comparison to a flat multiclass CNN model for ICD-O morphology classification of the same reports.
arXiv Detail & Related papers (2020-08-28T12:36:58Z) - Hierarchical Deep Learning Ensemble to Automate the Classification of
Breast Cancer Pathology Reports by ICD-O Topography [0.0]
We present a hierarchical deep learning ensemble method incorporating state of the art convolutional neural network models for the automatic labelling of 2201 pathology reports.
Our results show an improvement in primary site classification over the state of the art CNN model by greater than 14% for F1 micro and 55% for F1 macro scores.
arXiv Detail & Related papers (2020-08-28T10:29:56Z) - Machine-Learning-Based Multiple Abnormality Prediction with Large-Scale
Chest Computed Tomography Volumes [64.21642241351857]
We curated and analyzed a chest computed tomography (CT) data set of 36,316 volumes from 19,993 unique patients.
We developed a rule-based method for automatically extracting abnormality labels from free-text radiology reports.
We also developed a model for multi-organ, multi-disease classification of chest CT volumes.
arXiv Detail & Related papers (2020-02-12T00:59:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.