Hierarchical Classification System for Breast Cancer Specimen Report
(HCSBC) -- an end-to-end model for characterizing severity and diagnosis
- URL: http://arxiv.org/abs/2312.12442v1
- Date: Thu, 2 Nov 2023 18:37:45 GMT
- Title: Hierarchical Classification System for Breast Cancer Specimen Report
(HCSBC) -- an end-to-end model for characterizing severity and diagnosis
- Authors: Thiago Santos, Harish Kamath, Christopher R. McAdams, Mary S. Newell,
Marina Mosunjac, Gabriela Oprea-Ilies, Geoffrey Smith, Constance Lehman, Judy
Gichoya, Imon Banerjee, Hari Trivedi
- Abstract summary: We develop a hierarchical hybrid transformer-based pipeline (59 labels) - Hierarchical Classification System for Breast Cancer Specimen Report (HCSBC)
We trained the model on the EUH data and evaluated our model's performance on two external datasets - MGH and Mayo Clinic.
- Score: 3.4454444815042735
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Automated classification of cancer pathology reports can extract information
from unstructured reports and categorize each report into structured diagnosis
and severity categories. Thus, such system can reduce the burden for populating
tumor registries, help registration for clinical trial as well as developing
large dataset for deep learning model development using true pathologic ground
truth. However, the content of breast pathology reports can be difficult for
categorize due to the high linguistic variability in content and wide variety
of potential diagnoses >50. Existing NLP models are primarily focused on
developing classifier for primary breast cancer types (e.g. IDC, DCIS, ILC) and
tumor characteristics, and ignore the rare diagnosis of cancer subtypes. We
then developed a hierarchical hybrid transformer-based pipeline (59 labels) -
Hierarchical Classification System for Breast Cancer Specimen Report (HCSBC),
which utilizes the potential of the transformer context-preserving NLP
technique and compared our model to several state of the art ML and DL models.
We trained the model on the EUH data and evaluated our model's performance on
two external datasets - MGH and Mayo Clinic. We publicly release the code and a
live application under Huggingface spaces repository
Related papers
- Medical-GAT: Cancer Document Classification Leveraging Graph-Based Residual Network for Scenarios with Limited Data [2.913761513290171]
We present a curated dataset of 1,874 biomedical abstracts, categorized into thyroid cancer, colon cancer, lung cancer, and generic topics.
Our research focuses on leveraging this dataset to improve classification performance, particularly in data-scarce scenarios.
We introduce a Residual Graph Attention Network (R-GAT) with multiple graph attention layers that capture the semantic information and structural relationships within cancer-related documents.
arXiv Detail & Related papers (2024-10-19T20:07:40Z) - Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports [51.45762396192655]
Multimodal large language models (MLLMs) have recently transformed many domains, significantly affecting the medical field. Notably, Gemini-Vision-series (Gemini) and GPT-4-series (GPT-4) models have epitomized a paradigm shift in Artificial General Intelligence for computer vision.
This study evaluated the performance of the Gemini, GPT-4, and 4 popular large models for an exhaustive evaluation across 14 medical imaging datasets.
arXiv Detail & Related papers (2024-07-08T09:08:42Z) - ChatRadio-Valuer: A Chat Large Language Model for Generalizable
Radiology Report Generation Based on Multi-institution and Multi-system Data [115.0747462486285]
ChatRadio-Valuer is a tailored model for automatic radiology report generation that learns generalizable representations.
The clinical dataset utilized in this study encompasses a remarkable total of textbf332,673 observations.
ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al.
arXiv Detail & Related papers (2023-10-08T17:23:17Z) - PACS: Prediction and analysis of cancer subtypes from multi-omics data
based on a multi-head attention mechanism model [2.275409158519155]
We propose a supervised multi-head attention mechanism model (SMA) to classify cancer subtypes successfully.
The attention mechanism and feature sharing module of the SMA model can successfully learn the global and local feature information of multi-omics data.
The SMA model achieves the highest accuracy, F1 macroscopic, F1 weighted, and accurate classification of cancer subtypes in simulated, single-cell, and cancer multiomics datasets.
arXiv Detail & Related papers (2023-08-21T03:54:21Z) - A Personalized Diagnostic Generation Framework Based on Multi-source
Heterogeneous Data [8.115713756776119]
We propose a framework that combines pathological images and medical reports to generate a personalized diagnosis result for individual patient.
We use nuclei-level image feature similarity and content-based deep learning method to search for a personalized group of population with similar pathological characteristics.
arXiv Detail & Related papers (2021-10-26T13:12:52Z) - A multi-stage machine learning model on diagnosis of esophageal
manometry [50.591267188664666]
The framework includes deep-learning models at the swallow-level stage and feature-based machine learning models at the study-level stage.
This is the first artificial-intelligence-style model to automatically predict CC diagnosis of HRM study from raw multi-swallow data.
arXiv Detail & Related papers (2021-06-25T20:09:23Z) - A Novel Self-Learning Framework for Bladder Cancer Grading Using
Histopathological Images [1.244681179922733]
We present a self-learning framework to grade bladder cancer from histological images stained viachemical techniques.
We propose a novel Deep Convolutional Embedded Attention Clustering (DCEAC) which allows classifying histological patches into different levels of the disease.
arXiv Detail & Related papers (2021-06-25T11:04:04Z) - G-MIND: An End-to-End Multimodal Imaging-Genetics Framework for
Biomarker Identification and Disease Classification [49.53651166356737]
We propose a novel deep neural network architecture to integrate imaging and genetics data, as guided by diagnosis, that provides interpretable biomarkers.
We have evaluated our model on a population study of schizophrenia that includes two functional MRI (fMRI) paradigms and Single Nucleotide Polymorphism (SNP) data.
arXiv Detail & Related papers (2021-01-27T19:28:04Z) - Topological Data Analysis of copy number alterations in cancer [70.85487611525896]
We explore the potential to capture information contained in cancer genomic information using a novel topology-based approach.
We find that this technique has the potential to extract meaningful low-dimensional representations in cancer somatic genetic data.
arXiv Detail & Related papers (2020-11-22T17:31:23Z) - Hierarchical Deep Learning Classification of Unstructured Pathology
Reports to Automate ICD-O Morphology Grading [0.0]
We present a hierarchical deep learning classification method that employs convolutional neural network models to automate the classification of 1813 breast cancer pathology reports.
We demonstrate that the hierarchical deep learning classification method improves on performance in comparison to a flat multiclass CNN model for ICD-O morphology classification of the same reports.
arXiv Detail & Related papers (2020-08-28T12:36:58Z) - Weakly supervised multiple instance learning histopathological tumor
segmentation [51.085268272912415]
We propose a weakly supervised framework for whole slide imaging segmentation.
We exploit a multiple instance learning scheme for training models.
The proposed framework has been evaluated on multi-locations and multi-centric public data from The Cancer Genome Atlas and the PatchCamelyon dataset.
arXiv Detail & Related papers (2020-04-10T13:12:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.