Automated Cervical Os Segmentation for Camera-Guided, Speculum-Free Screening
- URL: http://arxiv.org/abs/2509.10593v1
- Date: Fri, 12 Sep 2025 14:19:27 GMT
- Title: Automated Cervical Os Segmentation for Camera-Guided, Speculum-Free Screening
- Authors: Aoife McDonald-Bowyer, Anjana Wijekoon, Ryan Laurance Love, Katie Allan, Scott Colvin, Aleksandra Gentry-Maharaj, Adeola Olaitan, Danail Stoyanov, Agostino Stilli, Sophia Bano,
- Abstract summary: This study evaluates deep learning methods for real-time segmentation of the cervical os in transvaginal endoscopic images.<n>EndoViT/DPT, a vision transformer pre-trained on surgical video, achieved the highest DICE (0.50 pm 0.31) and detection rate (0.87 pm 0.33)<n>These results establish a foundation for integrating automated os recognition into speculum-free cervical screening devices to support non-expert use.
- Score: 38.85521544870542
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cervical cancer is highly preventable, yet persistent barriers to screening limit progress toward elimination goals. Speculum-free devices that integrate imaging and sampling could improve access, particularly in low-resource settings, but require reliable visual guidance. This study evaluates deep learning methods for real-time segmentation of the cervical os in transvaginal endoscopic images. Five encoder-decoder architectures were compared using 913 frames from 200 cases in the IARC Cervical Image Dataset, annotated by gynaecologists. Performance was assessed using IoU, DICE, detection rate, and distance metrics with ten-fold cross-validation. EndoViT/DPT, a vision transformer pre-trained on surgical video, achieved the highest DICE (0.50 \pm 0.31) and detection rate (0.87 \pm 0.33), outperforming CNN-based approaches. External validation with phantom data demonstrated robust segmentation under variable conditions at 21.5 FPS, supporting real-time feasibility. These results establish a foundation for integrating automated os recognition into speculum-free cervical screening devices to support non-expert use in both high- and low-resource contexts.
Related papers
- Detection-Gated Glottal Segmentation with Zero-Shot Cross-Dataset Transfer and Clinical Feature Extraction [0.0]
We propose a detection-gated pipeline that integrates a YOLOv8-based detector with a U-Net segmenter.<n>The model was trained on a limited subset of the GIRAFE dataset (600 frames) and evaluated via zero-shot transfer on the large-scale BAGLS dataset.
arXiv Detail & Related papers (2026-03-02T17:05:41Z) - FUGC: Benchmarking Semi-Supervised Learning Methods for Cervical Segmentation [63.7829089874007]
This paper introduces the Fetal Ultrasound Grand Challenge (FUGC), the first benchmark for semi-supervised learning in cervical segmentation.<n>FUGC provides a dataset of 890 TVS images, including 500 training images, 90 validation images, and 300 test images.<n> Methods were evaluated using the Dice Similarity Coefficient (DSC), Hausdorff Distance (HD), and runtime (RT), with a weighted combination of 0.4/0.4/0.2.
arXiv Detail & Related papers (2026-01-22T01:34:39Z) - A Novel Attention-Augmented Wavelet YOLO System for Real-time Brain Vessel Segmentation on Transcranial Color-coded Doppler [49.03919553747297]
We propose an AI-powered, real-time CoW auto-segmentation system capable of efficiently capturing cerebral arteries.<n>No prior studies have explored AI-driven cerebrovascular segmentation using Transcranial Color-coded Doppler (TCCD)<n>The proposed AAW-YOLO demonstrated strong performance in segmenting both ipsilateral and contralateral CoW vessels.
arXiv Detail & Related papers (2025-08-19T14:41:22Z) - Real-Time Guidewire Tip Tracking Using a Siamese Network for Image-Guided Endovascular Procedures [27.037820619664654]
This paper focuses on guidewire tip tracking tasks during image-guided therapy for cardiovascular diseases.<n>A novel tracking framework based on a Siamese network with dual attention mechanisms combines self- and cross-attention strategies for robust tip tracking.<n>The framework maintains an average processing speed of 57.2 frames per second, meeting the temporal demands of endovascular imaging.
arXiv Detail & Related papers (2025-06-25T02:34:00Z) - HMSViT: A Hierarchical Masked Self-Supervised Vision Transformer for Corneal Nerve Segmentation and Diabetic Neuropathy Diagnosis [3.8141400767898603]
Diabetic Peripheral Neuropathy (DPN) affects nearly half of diabetes patients, requiring early detection.<n>We propose HMSViT, a novel Hierarchical Masked Self-Supervised Vision Transformer (HMSViT)<n>HMSViT employs pooling-based hierarchical and dual attention mechanisms with absolute positional encoding, enabling efficient multi-scale feature extraction.<n> Experiments on clinical CCM datasets showed HMSViT achieves state-of-the-art performance, with 61.34% mIoU for nerve segmentation and 70.40% diagnostic accuracy.
arXiv Detail & Related papers (2025-06-24T10:00:23Z) - Lightweight Relational Embedding in Task-Interpolated Few-Shot Networks for Enhanced Gastrointestinal Disease Classification [0.0]
Colon cancer detection is crucial for increasing patient survival rates.<n> colonoscopy is dependent on obtaining adequate and high-quality endoscopic images.<n>Few-Shot Learning architecture enables our model to rapidly adapt to unseen fine-grained endoscopic image patterns.<n>Our model demonstrated superior performance, achieving an accuracy of 90.1%, precision of 0.845, recall of 0.942, and an F1 score of 0.891.
arXiv Detail & Related papers (2025-05-30T16:54:51Z) - Towards a Benchmark for Colorectal Cancer Segmentation in Endorectal Ultrasound Videos: Dataset and Model Development [59.74920439478643]
In this paper, we collect and annotated the first benchmark dataset that covers diverse ERUS scenarios.
Our ERUS-10K dataset comprises 77 videos and 10,000 high-resolution annotated frames.
We introduce a benchmark model for colorectal cancer segmentation, named the Adaptive Sparse-context TRansformer (ASTR)
arXiv Detail & Related papers (2024-08-19T15:04:42Z) - Self-Supervised Learning for Interventional Image Analytics: Towards Robust Device Trackers [6.262161803642583]
We propose a novel approach to learn procedural features from a very large data cohort of over 16 million interventional X-ray frames.
Our approach is based on a masked image modeling technique that leverages frame-based reconstruction to learn fine inter-frame temporal correspondences.
Experiments show that our method achieves 66.31% reduction in maximum tracking error against reference solutions.
arXiv Detail & Related papers (2024-05-02T10:18:22Z) - CathFlow: Self-Supervised Segmentation of Catheters in Interventional Ultrasound Using Optical Flow and Transformers [66.15847237150909]
We introduce a self-supervised deep learning architecture to segment catheters in longitudinal ultrasound images.
The network architecture builds upon AiAReSeg, a segmentation transformer built with the Attention in Attention mechanism.
We validated our model on a test dataset, consisting of unseen synthetic data and images collected from silicon aorta phantoms.
arXiv Detail & Related papers (2024-03-21T15:13:36Z) - Data-Efficient Vision Transformers for Multi-Label Disease
Classification on Chest Radiographs [55.78588835407174]
Vision Transformers (ViTs) have not been applied to this task despite their high classification performance on generic images.
ViTs do not rely on convolutions but on patch-based self-attention and in contrast to CNNs, no prior knowledge of local connectivity is present.
Our results show that while the performance between ViTs and CNNs is on par with a small benefit for ViTs, DeiTs outperform the former if a reasonably large data set is available for training.
arXiv Detail & Related papers (2022-08-17T09:07:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.