A Cytology Dataset for Early Detection of Oral Squamous Cell Carcinoma
- URL: http://arxiv.org/abs/2506.09661v1
- Date: Wed, 11 Jun 2025 12:29:24 GMT
- Title: A Cytology Dataset for Early Detection of Oral Squamous Cell Carcinoma
- Authors: Garima Jain, Sanghamitra Pati, Mona Duggal, Amit Sethi, Abhijeet Patil, Gururaj Malekar, Nilesh Kowe, Jitender Kumar, Jatin Kashyap, Divyajeet Rout, Deepali, Hitesh, Nishi Halduniya, Sharat Kumar, Heena Tabassum, Rupinder Singh Dhaliwal, Sucheta Devi Khuraijam, Sushma Khuraijam, Sharmila Laishram, Simmi Kharb, Sunita Singh, K. Swaminadtan, Ranjana Solanki, Deepika Hemranjani, Shashank Nath Singh, Uma Handa, Manveen Kaur, Surinder Singhal, Shivani Kalhan, Rakesh Kumar Gupta, Ravi. S, D. Pavithra, Sunil Kumar Mahto, Arvind Kumar, Deepali Tirkey, Saurav Banerjee, L. Sreelakshmi,
- Abstract summary: Oral squamous cell carcinoma OSCC is a major global health burden, particularly in several regions across Asia, Africa, and South America, where it accounts for a significant proportion of cancer cases.<n>Traditional diagnosis based on histopathology has limited accessibility in low-resource settings because it is invasive, resource-intensive, and reliant on expert pathologists.<n>This resource aims to enhance automated detection, reduce diagnostic errors, and improve early OSCC diagnosis in resource-constrained settings, ultimately contributing to reduced mortality and better patient outcomes worldwide.
- Score: 2.6203127502299894
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Oral squamous cell carcinoma OSCC is a major global health burden, particularly in several regions across Asia, Africa, and South America, where it accounts for a significant proportion of cancer cases. Early detection dramatically improves outcomes, with stage I cancers achieving up to 90 percent survival. However, traditional diagnosis based on histopathology has limited accessibility in low-resource settings because it is invasive, resource-intensive, and reliant on expert pathologists. On the other hand, oral cytology of brush biopsy offers a minimally invasive and lower cost alternative, provided that the remaining challenges, inter observer variability and unavailability of expert pathologists can be addressed using artificial intelligence. Development and validation of robust AI solutions requires access to large, labeled, and multi-source datasets to train high capacity models that generalize across domain shifts. We introduce the first large and multicenter oral cytology dataset, comprising annotated slides stained with Papanicolaou(PAP) and May-Grunwald-Giemsa(MGG) protocols, collected from ten tertiary medical centers in India. The dataset is labeled and annotated by expert pathologists for cellular anomaly classification and detection, is designed to advance AI driven diagnostic methods. By filling the gap in publicly available oral cytology datasets, this resource aims to enhance automated detection, reduce diagnostic errors, and improve early OSCC diagnosis in resource-constrained settings, ultimately contributing to reduced mortality and better patient outcomes worldwide.
Related papers
- A High Magnifications Histopathology Image Dataset for Oral Squamous Cell Carcinoma Diagnosis and Prognosis [18.549808005574985]
Multi-OSCC is a new histopathology image dataset comprising 1,325 Oral Squamous Cell Carcinoma patients.<n>Each patient is represented by six high resolution histopathology images captured at x200, x400, and x1000 magnifications-two per magnification-covering both the core and edge tumor regions.<n>The dataset is richly annotated for six critical clinical tasks: recurrence prediction (REC), lymph node metastasis (LNM), tumor differentiation (TD), tumor invasion (TI) and perineural invasion (PI)
arXiv Detail & Related papers (2025-07-22T08:48:45Z) - An Agentic System for Rare Disease Diagnosis with Traceable Reasoning [58.78045864541539]
We introduce DeepRare, the first rare disease diagnosis agentic system powered by a large language model (LLM)<n>DeepRare generates ranked diagnostic hypotheses for rare diseases, each accompanied by a transparent chain of reasoning.<n>The system demonstrates exceptional diagnostic performance among 2,919 diseases, achieving 100% accuracy for 1013 diseases.
arXiv Detail & Related papers (2025-06-25T13:42:26Z) - ColonScopeX: Leveraging Explainable Expert Systems with Multimodal Data for Improved Early Diagnosis of Colorectal Cancer [3.541280502270993]
Colorectal cancer (CRC) ranks as the second leading cause of cancer-related deaths and the third most prevalent malignant tumour worldwide.<n>Early detection of CRC remains problematic due to its non-specific and often embarrassing symptoms.<n>We propose ColonScopeX, a machine learning framework utilizing explainable AI (XAI) methodologies to enhance the early detection of CRC and pre-cancerous lesions.
arXiv Detail & Related papers (2025-04-09T20:45:11Z) - A Retrospective Systematic Study on Hierarchical Sparse Query Transformer-assisted Ultrasound Screening for Early Hepatocellular Carcinoma [10.226976909997711]
HCC is the third leading cause of cancer-related mortality worldwide.<n>Recent advancements in AI technology offer promising solutions to bridge this gap.<n>HSQformer is a novel hybrid architecture that synergizes CNNs' local feature extraction with Vision Transformers' global contextual awareness.
arXiv Detail & Related papers (2025-02-06T04:17:02Z) - TopOC: Topological Deep Learning for Ovarian and Breast Cancer Diagnosis [3.262230127283452]
Topological data analysis offers a unique approach by extracting essential information through the evaluation of topological patterns across different color channels.
We show that the inclusion of topological features significantly improves the differentiation of tumor types in ovarian and breast cancers.
arXiv Detail & Related papers (2024-10-13T12:24:13Z) - Large-scale cervical precancerous screening via AI-assisted cytology whole slide image analysis [11.148919818020495]
Cervical Cancer continues to be the leading gynecological malignancy, posing a persistent threat to women's health on a global scale.
Early screening via Whole Slide Image (WSI) diagnosis is critical to prevent this Cancer progression and improve survival rate.
But pathologist's single test suffers inevitable false negative due to the immense number of cells that need to be reviewed within a WSI.
arXiv Detail & Related papers (2024-07-28T15:29:07Z) - Let it shine: Autofluorescence of Papanicolaou-stain improves AI-based cytological oral cancer detection [3.1850395068284785]
Oral cancer is treatable if detected early, but it is often fatal in late stages.
Computer-assisted methods are essential for cost-effective and accurate cytological analysis.
This study aims to improve AI-based oral cancer detection using multimodal imaging and deep fusion.
arXiv Detail & Related papers (2024-07-02T01:05:35Z) - A Survey of Artificial Intelligence in Gait-Based Neurodegenerative Disease Diagnosis [51.07114445705692]
neurodegenerative diseases (NDs) traditionally require extensive healthcare resources and human effort for medical diagnosis and monitoring.<n>As a crucial disease-related motor symptom, human gait can be exploited to characterize different NDs.<n>The current advances in artificial intelligence (AI) models enable automatic gait analysis for NDs identification and classification.
arXiv Detail & Related papers (2024-05-21T06:44:40Z) - Cancer-Net PCa-Gen: Synthesis of Realistic Prostate Diffusion Weighted
Imaging Data via Anatomic-Conditional Controlled Latent Diffusion [68.45407109385306]
In Canada, prostate cancer is the most common form of cancer in men and accounted for 20% of new cancer cases for this demographic in 2022.
There has been significant interest in the development of deep neural networks for prostate cancer diagnosis, prognosis, and treatment planning using diffusion weighted imaging (DWI) data.
In this study, we explore the efficacy of latent diffusion for generating realistic prostate DWI data through the introduction of an anatomic-conditional controlled latent diffusion strategy.
arXiv Detail & Related papers (2023-11-30T15:11:03Z) - A Pathologist-Informed Workflow for Classification of Prostate Glands in
Histopathology [62.997667081978825]
Pathologists diagnose and grade prostate cancer by examining tissue from needle biopsies on glass slides.
Cancer's severity and risk of metastasis are determined by the Gleason grade, a score based on the organization and morphology of prostate cancer glands.
This paper proposes an automated workflow that follows pathologists' textitmodus operandi, isolating and classifying multi-scale patches of individual glands.
arXiv Detail & Related papers (2022-09-27T14:08:19Z) - Multi-Scale Hybrid Vision Transformer for Learning Gastric Histology:
AI-Based Decision Support System for Gastric Cancer Treatment [50.89811515036067]
Gastric endoscopic screening is an effective way to decide appropriate gastric cancer (GC) treatment at an early stage, reducing GC-associated mortality rate.
We propose a practical AI system that enables five subclassifications of GC pathology, which can be directly matched to general GC treatment guidance.
arXiv Detail & Related papers (2022-02-17T08:33:52Z) - Unsupervised deep learning techniques for powdery mildew recognition
based on multispectral imaging [63.62764375279861]
This paper presents a deep learning approach to automatically recognize powdery mildew on cucumber leaves.
We focus on unsupervised deep learning techniques applied to multispectral imaging data.
We propose the use of autoencoder architectures to investigate two strategies for disease detection.
arXiv Detail & Related papers (2021-12-20T13:29:13Z) - Spatio-spectral deep learning methods for in-vivo hyperspectral
laryngeal cancer detection [49.32653090178743]
Early detection of head and neck tumors is crucial for patient survival.
Hyperspectral imaging (HSI) can be used for non-invasive detection of head and neck tumors.
We present multiple deep learning techniques for in-vivo laryngeal cancer detection based on HSI.
arXiv Detail & Related papers (2020-04-21T17:07:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.