A Lung Nodule Dataset with Histopathology-based Cancer Type Annotation
- URL: http://arxiv.org/abs/2406.18102v1
- Date: Wed, 26 Jun 2024 06:39:11 GMT
- Title: A Lung Nodule Dataset with Histopathology-based Cancer Type Annotation
- Authors: Muwei Jian, Hongyu Chen, Zaiyong Zhang, Nan Yang, Haorang Zhang, Lifu Ma, Wenjing Xu, Huixiang Zhi,
- Abstract summary: This research aims to bridge the gap by providing publicly accessible datasets and reliable tools for medical diagnosis.
We curated a diverse dataset of lung Computed Tomography (CT) images, comprising 330 annotated nodules (nodules are labeled as bounding boxes) from 95 distinct patients.
These promising results demonstrate that the dataset has a feasible application and further facilitate intelligent auxiliary diagnosis.
- Score: 12.617587827105496
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, Computer-Aided Diagnosis (CAD) systems have emerged as indispensable tools in clinical diagnostic workflows, significantly alleviating the burden on radiologists. Nevertheless, despite their integration into clinical settings, CAD systems encounter limitations. Specifically, while CAD systems can achieve high performance in the detection of lung nodules, they face challenges in accurately predicting multiple cancer types. This limitation can be attributed to the scarcity of publicly available datasets annotated with expert-level cancer type information. This research aims to bridge this gap by providing publicly accessible datasets and reliable tools for medical diagnosis, facilitating a finer categorization of different types of lung diseases so as to offer precise treatment recommendations. To achieve this objective, we curated a diverse dataset of lung Computed Tomography (CT) images, comprising 330 annotated nodules (nodules are labeled as bounding boxes) from 95 distinct patients. The quality of the dataset was evaluated using a variety of classical classification and detection models, and these promising results demonstrate that the dataset has a feasible application and further facilitate intelligent auxiliary diagnosis.
Related papers
- Large-scale Long-tailed Disease Diagnosis on Radiology Images [51.453990034460304]
RadDiag is a foundational model supporting 2D and 3D inputs across various modalities and anatomies.
Our dataset, RP3D-DiagDS, contains 40,936 cases with 195,010 scans covering 5,568 disorders.
arXiv Detail & Related papers (2023-12-26T18:20:48Z) - An Empirical Analysis for Zero-Shot Multi-Label Classification on
COVID-19 CT Scans and Uncurated Reports [0.5527944417831603]
pandemic resulted in vast repositories of unstructured data, including radiology reports, due to increased medical examinations.
Previous research on automated diagnosis of COVID-19 primarily focuses on X-ray images, despite their lower precision compared to computed tomography (CT) scans.
In this work, we leverage unstructured data from a hospital and harness the fine-grained details offered by CT scans to perform zero-shot multi-label classification based on contrastive visual language learning.
arXiv Detail & Related papers (2023-09-04T17:58:01Z) - CARE: A Large Scale CT Image Dataset and Clinical Applicable Benchmark
Model for Rectal Cancer Segmentation [8.728236864462302]
Rectal cancer segmentation of CT image plays a crucial role in timely clinical diagnosis, radiotherapy treatment, and follow-up.
These obstacles arise from the intricate anatomical structures of the rectum and the difficulties in performing differential diagnosis of rectal cancer.
To address these issues, this work introduces a novel large scale rectal cancer CT image dataset CARE with pixel-level annotations for both normal and cancerous rectum.
We also propose a novel medical cancer lesion segmentation benchmark model named U-SAM.
The model is specifically designed to tackle the challenges posed by the intricate anatomical structures of abdominal organs by incorporating prompt information.
arXiv Detail & Related papers (2023-08-16T10:51:27Z) - Towards Reliable and Explainable AI Model for Solid Pulmonary Nodule
Diagnosis [20.510918720980467]
Lung cancer has the highest mortality rate of deadly cancers in the world.
Computer-aided diagnosis (CAD) systems have been developed to assist radiologists in nodule detection and diagnosis.
Lack of model reliability and interpretability remains a major obstacle for its large-scale clinical application.
arXiv Detail & Related papers (2022-04-08T08:21:00Z) - A Personalized Diagnostic Generation Framework Based on Multi-source
Heterogeneous Data [8.115713756776119]
We propose a framework that combines pathological images and medical reports to generate a personalized diagnosis result for individual patient.
We use nuclei-level image feature similarity and content-based deep learning method to search for a personalized group of population with similar pathological characteristics.
arXiv Detail & Related papers (2021-10-26T13:12:52Z) - Variational Knowledge Distillation for Disease Classification in Chest
X-Rays [102.04931207504173]
We propose itvariational knowledge distillation (VKD), which is a new probabilistic inference framework for disease classification based on X-rays.
We demonstrate the effectiveness of our method on three public benchmark datasets with paired X-ray images and EHRs.
arXiv Detail & Related papers (2021-03-19T14:13:56Z) - Inheritance-guided Hierarchical Assignment for Clinical Automatic
Diagnosis [50.15205065710629]
Clinical diagnosis, which aims to assign diagnosis codes for a patient based on the clinical note, plays an essential role in clinical decision-making.
We propose a novel framework to combine the inheritance-guided hierarchical assignment and co-occurrence graph propagation for clinical automatic diagnosis.
arXiv Detail & Related papers (2021-01-27T13:16:51Z) - Topological Data Analysis of copy number alterations in cancer [70.85487611525896]
We explore the potential to capture information contained in cancer genomic information using a novel topology-based approach.
We find that this technique has the potential to extract meaningful low-dimensional representations in cancer somatic genetic data.
arXiv Detail & Related papers (2020-11-22T17:31:23Z) - Hierarchical Classification of Pulmonary Lesions: A Large-Scale
Radio-Pathomics Study [38.78350161086617]
Diagnosis of pulmonary lesions from computed tomography (CT) is important but challenging for clinical decision making in lung cancer related diseases.
Deep learning has achieved great success in computer aided diagnosis (CADx) area for lung cancer, whereas it suffers from label ambiguity due to the difficulty in the radiological diagnosis.
Considering that invasive pathological analysis serves as the clinical golden standard of lung cancer diagnosis, in this study, we solve the label ambiguity issue via a large-scale radio-pathomics dataset.
This retrospective dataset, named Pulmonary-RadPath, enables development and validation of accurate deep learning systems to predict invasive pathological labels with a non-
arXiv Detail & Related papers (2020-10-08T15:14:34Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - Trajectories, bifurcations and pseudotime in large clinical datasets:
applications to myocardial infarction and diabetes data [94.37521840642141]
We suggest a semi-supervised methodology for the analysis of large clinical datasets, characterized by mixed data types and missing values.
The methodology is based on application of elastic principal graphs which can address simultaneously the tasks of dimensionality reduction, data visualization, clustering, feature selection and quantifying the geodesic distances (pseudotime) in partially ordered sequences of observations.
arXiv Detail & Related papers (2020-07-07T21:04:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.