Multi-head automated segmentation by incorporating detection head into the contextual layer neural network
- URL: http://arxiv.org/abs/2602.02471v1
- Date: Mon, 02 Feb 2026 18:51:25 GMT
- Title: Multi-head automated segmentation by incorporating detection head into the contextual layer neural network
- Authors: Edwin Kys, Febian Febian,
- Abstract summary: We propose a gated multi-head Transformer architecture based on Swin U-Net, augmented with inter-slice context integration.<n>We show that the gated model substantially outperforms a non-gated segmentation-only baseline.<n>These results indicate that detection-based gating enhances robustness and anatomical plausibility in automated segmentation applications.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning based auto segmentation is increasingly used in radiotherapy, but conventional models often produce anatomically implausible false positives, or hallucinations, in slices lacking target structures. We propose a gated multi-head Transformer architecture based on Swin U-Net, augmented with inter-slice context integration and a parallel detection head, which jointly performs slice-level structure detection via a multi-layer perceptron and pixel-level segmentation through a context-enhanced stream. Detection outputs gate the segmentation predictions to suppress false positives in anatomically invalid slices, and training uses slice-wise Tversky loss to address class imbalance. Experiments on the Prostate-Anatomical-Edge-Cases dataset from The Cancer Imaging Archive demonstrate that the gated model substantially outperforms a non-gated segmentation-only baseline, achieving a mean Dice loss of $0.013 \pm 0.036$ versus $0.732 \pm 0.314$, with detection probabilities strongly correlated with anatomical presence, effectively eliminating spurious segmentations. In contrast, the non-gated model exhibited higher variability and persistent false positives across all slices. These results indicate that detection-based gating enhances robustness and anatomical plausibility in automated segmentation applications, reducing hallucinated predictions without compromising segmentation quality in valid slices, and offers a promising approach for improving the reliability of clinical radiotherapy auto-contouring workflows.
Related papers
- Region of interest detection for efficient aortic segmentation [2.172261472991099]
Thoracic aortic dissection and aneurysms are the most lethal diseases of the aorta.<n>Aortic segmentation of the 3D image is often tedious and difficult.<n>Deep-learning-based segmentation models are an ideal solution, but their inability to deliver usable outputs in difficult cases and their computational cost cause their clinical adoption to stay limited.
arXiv Detail & Related papers (2026-01-13T16:04:45Z) - PF-DAformer: Proximal Femur Segmentation via Domain Adaptive Transformer for Dual-Center QCT [8.358409792893278]
We develop a domain-adaptive transformer segmentation framework tailored for multi-institutional Quantitative computed tomography (QCT)<n>Our model is trained and validated on one of the largest hip fracture related research cohorts to date, comprising 1,024 QCT images scans from Tulane University and 384 scans from Rochester, Minnesota for proximal femur segmentation.
arXiv Detail & Related papers (2025-10-30T18:07:56Z) - Dimensionality Reduction and Nearest Neighbors for Improving Out-of-Distribution Detection in Medical Image Segmentation [1.2873975765521795]
This work applied the Mahalanobis distance (MD) post hoc to the bottleneck features of four Swin UNETR and nnU-net models that segmented the liver.
Images the models failed on were detected with high performance and minimal computational load.
arXiv Detail & Related papers (2024-08-05T18:24:48Z) - SINDER: Repairing the Singular Defects of DINOv2 [61.98878352956125]
Vision Transformer models trained on large-scale datasets often exhibit artifacts in the patch token they extract.
We propose a novel fine-tuning smooth regularization that rectifies structural deficiencies using only a small dataset.
arXiv Detail & Related papers (2024-07-23T20:34:23Z) - Multimodal Prototyping for cancer survival prediction [45.61869793509184]
Multimodal survival methods combining gigapixel histology whole-slide images (WSIs) and transcriptomic profiles are particularly promising for patient prognostication and stratification.
Current approaches involve tokenizing the WSIs into smaller patches (>10,000 patches) and transcriptomics into gene groups, which are then integrated using a Transformer for predicting outcomes.
This process generates many tokens, which leads to high memory requirements for computing attention and complicates post-hoc interpretability analyses.
Our framework outperforms state-of-the-art methods with much less computation while unlocking new interpretability analyses.
arXiv Detail & Related papers (2024-06-28T20:37:01Z) - Self-calibrated convolution towards glioma segmentation [45.74830585715129]
We evaluate self-calibrated convolutions in different parts of the nnU-Net network to demonstrate that self-calibrated modules in skip connections can significantly improve the enhanced-tumor and tumor-core segmentation accuracy.
arXiv Detail & Related papers (2024-02-07T19:51:13Z) - Reliable Joint Segmentation of Retinal Edema Lesions in OCT Images [55.83984261827332]
In this paper, we propose a novel reliable multi-scale wavelet-enhanced transformer network.
We develop a novel segmentation backbone that integrates a wavelet-enhanced feature extractor network and a multi-scale transformer module.
Our proposed method achieves better segmentation accuracy with a high degree of reliability as compared to other state-of-the-art segmentation approaches.
arXiv Detail & Related papers (2022-12-01T07:32:56Z) - SATr: Slice Attention with Transformer for Universal Lesion Detection [39.90420943500884]
Universal Lesion Detection (ULD) in computed tomography plays an essential role in computer-aided diagnosis.
We propose a novel Slice Attention Transformer (SATr) block which can be easily plugged into convolution-based ULD backbones.
Experiments with five state-of-the-art methods show that the proposed SATr block can provide an almost free boost to lesion detection accuracy.
arXiv Detail & Related papers (2022-03-13T03:37:27Z) - Hepatic vessel segmentation based on 3Dswin-transformer with inductive
biased multi-head self-attention [46.46365941681487]
We propose a robust end-to-end vessel segmentation network called Indu BIased Multi-Head Attention Vessel Net.
We introduce the voxel-wise embedding rather than patch-wise embedding to locate precise liver vessel voxels.
On the other hand, we propose inductive biased multi-head self-attention which learns inductive biased relative positional embedding from absolute position embedding.
arXiv Detail & Related papers (2021-11-05T10:17:08Z) - Collaborative Boundary-aware Context Encoding Networks for Error Map
Prediction [65.44752447868626]
We propose collaborative boundaryaware context encoding networks called AEP-Net for error prediction task.
Specifically, we propose a collaborative feature transformation branch for better feature fusion between images and masks, and precise localization of error regions.
The AEP-Net achieves an average DSC of 0.8358, 0.8164 for error prediction task, and shows a high Pearson correlation coefficient of 0.9873.
arXiv Detail & Related papers (2020-06-25T12:42:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.