Dual Cross-Attention Siamese Transformer for Rectal Tumor Regrowth Assessment in Watch-and-Wait Endoscopy
- URL: http://arxiv.org/abs/2512.03883v1
- Date: Wed, 03 Dec 2025 15:34:29 GMT
- Title: Dual Cross-Attention Siamese Transformer for Rectal Tumor Regrowth Assessment in Watch-and-Wait Endoscopy
- Authors: Jorge Tapias Gomez, Despoina Kanata, Aneesh Rangnekar, Christina Lee, Julio Garcia-Aguilar, Joshua Jesse Smith, Harini Veeraraghavan,
- Abstract summary: We developed a Siamese Swin Transformer with Dual Cross-Attention (SSDCA) to combine longitudinal endoscopic images at restaging and follow-up.<n>SSDCA produced the best balanced accuracy (81.76% $pm$ 0.04), sensitivity (90.07% $pm$0.08), and specificity (72.86% $pm$ 0.05)<n> robustness analysis showed stable performance irrespective of artifacts including blood, stool, telangiectasia, and poor image quality.
- Score: 1.8302060641103655
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Increasing evidence supports watch-and-wait (WW) surveillance for patients with rectal cancer who show clinical complete response (cCR) at restaging following total neoadjuvant treatment (TNT). However, objectively accurate methods to early detect local regrowth (LR) from follow-up endoscopy images during WW are essential to manage care and prevent distant metastases. Hence, we developed a Siamese Swin Transformer with Dual Cross-Attention (SSDCA) to combine longitudinal endoscopic images at restaging and follow-up and distinguish cCR from LR. SSDCA leverages pretrained Swin transformers to extract domain agnostic features and enhance robustness to imaging variations. Dual cross attention is implemented to emphasize features from the two scans without requiring any spatial alignment of images to predict response. SSDCA as well as Swin-based baselines were trained using image pairs from 135 patients and evaluated on a held-out set of image pairs from 62 patients. SSDCA produced the best balanced accuracy (81.76\% $\pm$ 0.04), sensitivity (90.07\% $\pm$ 0.08), and specificity (72.86\% $\pm$ 0.05). Robustness analysis showed stable performance irrespective of artifacts including blood, stool, telangiectasia, and poor image quality. UMAP clustering of extracted features showed maximal inter-cluster separation (1.45 $\pm$ 0.18) and minimal intra-cluster dispersion (1.07 $\pm$ 0.19) with SSDCA, confirming discriminative representation learning.
Related papers
- Prompt-Free SAM-Based Multi-Task Framework for Breast Ultrasound Lesion Segmentation and Classification [0.4083182125683813]
This study presents a multi-task deep learning framework that jointly performs lesion segmentation and diagnostic classification.<n>Our approach employs a prompt-free, fully supervised adaptation where high-dimensional SAM features are decoded through either a lightweight convolutional head or a UNet-inspired decoder for pixel-wise segmentation.<n> Experiments on the PRECISE 2025 breast ultrasound dataset, split per class into 80 percent training and 20 percent testing, show that the proposed method achieves a Dice Similarity Coefficient (DSC) of 0.887 and an accuracy of 92.3 percent.
arXiv Detail & Related papers (2026-01-09T03:02:41Z) - Neural Discrete Representation Learning for Sparse-View CBCT Reconstruction: From Algorithm Design to Prospective Multicenter Clinical Evaluation [64.42236775544579]
Cone beam computed tomography (CBCT)-guided puncture has become an established approach for diagnosing and treating thoracic tumours.<n>DeepPriorCBCT is a three-stage deep learning framework that achieves diagnostic-grade reconstruction using only one-sixth of the conventional radiation dose.
arXiv Detail & Related papers (2025-11-30T12:45:02Z) - Lightweight Relational Embedding in Task-Interpolated Few-Shot Networks for Enhanced Gastrointestinal Disease Classification [0.0]
Colon cancer detection is crucial for increasing patient survival rates.<n> colonoscopy is dependent on obtaining adequate and high-quality endoscopic images.<n>Few-Shot Learning architecture enables our model to rapidly adapt to unseen fine-grained endoscopic image patterns.<n>Our model demonstrated superior performance, achieving an accuracy of 90.1%, precision of 0.845, recall of 0.942, and an F1 score of 0.891.
arXiv Detail & Related papers (2025-05-30T16:54:51Z) - Enhanced Denoising of Optical Coherence Tomography Images Using Residual U-Net [0.0]
We propose an enhanced denoising model using a Residual U-Net architecture that effectively diminishes noise and improves image clarity.
Peak Signal Noise Ratio (PSNR) was 34.343 $pm$ 1.113 for PS OCT images, and Structural Similarity Index Measure (SSIM) values were 0.885 $pm$ 0.030.
arXiv Detail & Related papers (2024-07-18T01:35:03Z) - Swin transformers are robust to distribution and concept drift in endoscopy-based longitudinal rectal cancer assessment [3.0468533447146244]
Endoscopic images are used at various stages of rectal cancer treatment to assess response and toxicity from treatments.<n> subjective assessment is highly variable and can underestimate the degree of response in some patients.<n>Advances in deep learning has shown the ability to produce consistent and objective response assessment for endoscopic images.
arXiv Detail & Related papers (2024-05-06T18:01:13Z) - Step-Calibrated Diffusion for Biomedical Optical Image Restoration [47.191704042917394]
Restorative Step-Calibrated Diffusion (RSCD) is an unpaired diffusion-based image restoration method.<n>RSCD outperforms other widely used unpaired image restoration methods on both image quality and perceptual evaluation.<n>RSCD improves performance on downstream clinical imaging tasks, including automated brain tumor diagnosis and deep tissue imaging.
arXiv Detail & Related papers (2024-03-20T15:38:53Z) - Longitudinal Multimodal Transformer Integrating Imaging and Latent
Clinical Signatures From Routine EHRs for Pulmonary Nodule Classification [4.002181247287472]
We propose a transformer-based multimodal strategy to integrate repeat imaging with longitudinal clinical signatures from EHRs for solitary pulmonary nodule (SPN) classification.
We perform unsupervised disentanglement of latent clinical signatures and leverage time-distance scaled self-attention to jointly learn from clinical signatures expressions and chest computed tomography (CT) scans.
arXiv Detail & Related papers (2023-04-06T03:03:07Z) - Automated SSIM Regression for Detection and Quantification of Motion
Artefacts in Brain MR Images [54.739076152240024]
Motion artefacts in magnetic resonance brain images are a crucial issue.
The assessment of MR image quality is fundamental before proceeding with the clinical diagnosis.
An automated image quality assessment based on the structural similarity index (SSIM) regression has been proposed here.
arXiv Detail & Related papers (2022-06-14T10:16:54Z) - OCT-GAN: Single Step Shadow and Noise Removal from Optical Coherence
Tomography Images of the Human Optic Nerve Head [47.812972855826985]
We developed a single process that successfully removed both noise and retinal shadows from unseen single-frame B-scans within 10.4ms.
The proposed algorithm reduces the necessity for long image acquisition times, minimizes expensive hardware requirements and reduces motion artifacts in OCT images.
arXiv Detail & Related papers (2020-10-06T08:32:32Z) - Co-Heterogeneous and Adaptive Segmentation from Multi-Source and
Multi-Phase CT Imaging Data: A Study on Pathological Liver and Lesion
Segmentation [48.504790189796836]
We present a novel segmentation strategy, co-heterogenous and adaptive segmentation (CHASe)
We propose a versatile framework that fuses appearance based semi-supervision, mask based adversarial domain adaptation, and pseudo-labeling.
CHASe can further improve pathological liver mask Dice-Sorensen coefficients by ranges of $4.2% sim 9.4%$.
arXiv Detail & Related papers (2020-05-27T06:58:39Z) - A multicenter study on radiomic features from T$_2$-weighted images of a
customized MR pelvic phantom setting the basis for robust radiomic models in
clinics [47.187609203210705]
2D and 3D T$$-weighted images of a pelvic phantom were acquired on three scanners.
repeatability and repositioning of radiomic features were assessed.
arXiv Detail & Related papers (2020-05-14T09:24:48Z) - Detecting Pancreatic Ductal Adenocarcinoma in Multi-phase CT Scans via
Alignment Ensemble [77.5625174267105]
Pancreatic ductal adenocarcinoma (PDAC) is one of the most lethal cancers among the population.
Multiple phases provide more information than single phase, but they are unaligned and inhomogeneous in texture.
We suggest an ensemble of all these alignments as a promising way to boost the performance of PDAC detection.
arXiv Detail & Related papers (2020-03-18T19:06:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.