Classification of cancer pathology reports: a large-scale comparative
study
- URL: http://arxiv.org/abs/2006.16370v1
- Date: Mon, 29 Jun 2020 20:47:33 GMT
- Title: Classification of cancer pathology reports: a large-scale comparative
study
- Authors: Stefano Martina, Leonardo Ventura, Paolo Frasconi
- Abstract summary: We apply state-of-the-art deep learning techniques to the automatic assignment of ICD-O3 topography and morphology codes to free-text cancer reports.
We present results on a large dataset (more than 80 000 labeled and 1 500 000 unlabeled anonymized reports written in Italian and collected from hospitals in Tuscany over more than a decade) and with a large number of classes.
- Score: 8.211700929845689
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We report about the application of state-of-the-art deep learning techniques
to the automatic and interpretable assignment of ICD-O3 topography and
morphology codes to free-text cancer reports. We present results on a large
dataset (more than 80 000 labeled and 1 500 000 unlabeled anonymized reports
written in Italian and collected from hospitals in Tuscany over more than a
decade) and with a large number of classes (134 morphological classes and 61
topographical classes). We compare alternative architectures in terms of
prediction accuracy and interpretability and show that our best model achieves
a multiclass accuracy of 90.3% on topography site assignment and 84.8% on
morphology type assignment. We found that in this context hierarchical models
are not better than flat models and that an element-wise maximum aggregator is
slightly better than attentive models on site classification. Moreover, the
maximum aggregator offers a way to interpret the classification process.
Related papers
- Language Models for Text Classification: Is In-Context Learning Enough? [54.869097980761595]
Recent foundational language models have shown state-of-the-art performance in many NLP tasks in zero- and few-shot settings.
An advantage of these models over more standard approaches is the ability to understand instructions written in natural language (prompts)
This makes them suitable for addressing text classification problems for domains with limited amounts of annotated instances.
arXiv Detail & Related papers (2024-03-26T12:47:39Z) - One Model to Rule them All: Towards Universal Segmentation for Medical Images with Text Prompts [62.55349777609194]
We aim to build up a model that can Segment Anything in radiology scans, driven by Text prompts, termed as SAT.
We build up the largest and most comprehensive segmentation dataset for training, by collecting over 22K 3D medical image scans.
We have trained SAT-Nano (110M parameters) and SAT-Pro (447M parameters) demonstrating comparable performance to 72 specialist nnU-Nets trained on each dataset/subsets.
arXiv Detail & Related papers (2023-12-28T18:16:00Z) - PathLDM: Text conditioned Latent Diffusion Model for Histopathology [62.970593674481414]
We introduce PathLDM, the first text-conditioned Latent Diffusion Model tailored for generating high-quality histopathology images.
Our approach fuses image and textual data to enhance the generation process.
We achieved a SoTA FID score of 7.64 for text-to-image generation on the TCGA-BRCA dataset, significantly outperforming the closest text-conditioned competitor with FID 30.1.
arXiv Detail & Related papers (2023-09-01T22:08:32Z) - Significantly improving zero-shot X-ray pathology classification via
fine-tuning pre-trained image-text encoders [51.14431540035141]
We propose a new fine-tuning strategy based on sentence sampling and positive-pair loss relaxation for improving the downstream zero-shot pathology classification performance.
Our method consistently showed dramatically improved zero-shot pathology classification performance on four different chest X-ray datasets.
arXiv Detail & Related papers (2022-12-14T06:04:18Z) - Weakly-supervised segmentation using inherently-explainable
classification models and their application to brain tumour classification [0.46873264197900916]
This paper introduces three inherently-explainable classifiers to tackle both of these problems as one.
The models were employed for the task of multi-class brain tumour classification using two different datasets.
The obtained accuracy on a subset of tumour-only images outperformed the state-of-the-art glioma tumour grading binary classifiers with the best model achieving 98.7% accuracy.
arXiv Detail & Related papers (2022-06-10T14:44:05Z) - Cancer Gene Profiling through Unsupervised Discovery [49.28556294619424]
We introduce a novel, automatic and unsupervised framework to discover low-dimensional gene biomarkers.
Our method is based on the LP-Stability algorithm, a high dimensional center-based unsupervised clustering algorithm.
Our signature reports promising results on distinguishing immune inflammatory and immune desert tumors.
arXiv Detail & Related papers (2021-02-11T09:04:45Z) - Convolutional Neural Networks in Multi-Class Classification of Medical
Data [0.9137554315375922]
We introduce an ensemble model that consists of both deep learning (CNN) and shallow learning models (Gradient Boosting)
The method achieves Accuracy of 64.93, the highest three-class classification accuracy we achieved in this study.
arXiv Detail & Related papers (2020-12-28T02:04:38Z) - A Multi-resolution Model for Histopathology Image Classification and
Localization with Multiple Instance Learning [9.36505887990307]
We propose a multi-resolution multiple instance learning model that leverages saliency maps to detect suspicious regions for fine-grained grade prediction.
The model is developed on a large-scale prostate biopsy dataset containing 20,229 slides from 830 patients.
The model achieved 92.7% accuracy, 81.8% Cohen's Kappa for benign, low grade (i.e. Grade group 1) and high grade (i.e. Grade group >= 2) prediction, an area under the receiver operating characteristic curve (AUROC) of 98.2% and an average precision (AP) of 97.4%.
arXiv Detail & Related papers (2020-11-05T06:42:39Z) - Hierarchical Deep Learning Classification of Unstructured Pathology
Reports to Automate ICD-O Morphology Grading [0.0]
We present a hierarchical deep learning classification method that employs convolutional neural network models to automate the classification of 1813 breast cancer pathology reports.
We demonstrate that the hierarchical deep learning classification method improves on performance in comparison to a flat multiclass CNN model for ICD-O morphology classification of the same reports.
arXiv Detail & Related papers (2020-08-28T12:36:58Z) - Hierarchical Deep Learning Ensemble to Automate the Classification of
Breast Cancer Pathology Reports by ICD-O Topography [0.0]
We present a hierarchical deep learning ensemble method incorporating state of the art convolutional neural network models for the automatic labelling of 2201 pathology reports.
Our results show an improvement in primary site classification over the state of the art CNN model by greater than 14% for F1 micro and 55% for F1 macro scores.
arXiv Detail & Related papers (2020-08-28T10:29:56Z) - Machine-Learning-Based Multiple Abnormality Prediction with Large-Scale
Chest Computed Tomography Volumes [64.21642241351857]
We curated and analyzed a chest computed tomography (CT) data set of 36,316 volumes from 19,993 unique patients.
We developed a rule-based method for automatically extracting abnormality labels from free-text radiology reports.
We also developed a model for multi-organ, multi-disease classification of chest CT volumes.
arXiv Detail & Related papers (2020-02-12T00:59:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.