Related papers: Self-Supervised Contrastive Embedding Adaptation for Endoscopic Image Matching

Self-Supervised Contrastive Embedding Adaptation for Endoscopic Image Matching

URL: http://arxiv.org/abs/2512.10379v1
Date: Thu, 11 Dec 2025 07:44:00 GMT
Title: Self-Supervised Contrastive Embedding Adaptation for Endoscopic Image Matching
Authors: Alberto Rota, Elena De Momi,
Abstract summary: This research presents a novel Deep Learning pipeline for establishing feature correspondences in endoscopic image pairs.<n>The proposed methodology leverages a novel-view synthesis pipeline to generate ground-truth inlier correspondences.<n>Our pipeline surpasses state-of-the-art methodologies on the SCARED datasets improved matching precision and lower epipolar error.
Score: 7.674595072442547
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Accurate spatial understanding is essential for image-guided surgery, augmented reality integration and context awareness. In minimally invasive procedures, where visual input is the sole intraoperative modality, establishing precise pixel-level correspondences between endoscopic frames is critical for 3D reconstruction, camera tracking, and scene interpretation. However, the surgical domain presents distinct challenges: weak perspective cues, non-Lambertian tissue reflections, and complex, deformable anatomy degrade the performance of conventional computer vision techniques. While Deep Learning models have shown strong performance in natural scenes, their features are not inherently suited for fine-grained matching in surgical images and require targeted adaptation to meet the demands of this domain. This research presents a novel Deep Learning pipeline for establishing feature correspondences in endoscopic image pairs, alongside a self-supervised optimization framework for model training. The proposed methodology leverages a novel-view synthesis pipeline to generate ground-truth inlier correspondences, subsequently utilized for mining triplets within a contrastive learning paradigm. Through this self-supervised approach, we augment the DINOv2 backbone with an additional Transformer layer, specifically optimized to produce embeddings that facilitate direct matching through cosine similarity thresholding. Experimental evaluation demonstrates that our pipeline surpasses state-of-the-art methodologies on the SCARED datasets improved matching precision and lower epipolar error compared to the related work. The proposed framework constitutes a valuable contribution toward enabling more accurate high-level computer vision applications in surgical endoscopy.

Related papers

BronchOpt : Vision-Based Pose Optimization with Fine-Tuned Foundation Models for Accurate Bronchoscopy Navigation [6.915058920280426]
We propose a vision-based pose optimization framework for 2D-3D registration between intra-operative endoscopic views and pre-operative CT anatomy.<n>A fine-tuned modality- and domain-invariant encoder enables direct similarity between real endoscopic RGB frames and CT-rendered depth maps.<n>Our model achieves an average translational error of 2.65 mm and a rotational error of 0.19 rad, demonstrating accurate and stable localization.
arXiv Detail & Related papers (2025-11-12T15:58:05Z)
Efficient 3D Scene Reconstruction and Simulation from Sparse Endoscopic Views [0.7614628596146601]
We propose a Gaussian Splatting-based framework to reconstruct interactive surgical scenes from endoscopic data.<n>A key challenge in this data-driven simulation paradigm is the restricted movement of endoscopic cameras.<n>We show that our method can efficiently reconstruct and simulate surgical scenes from sparse endoscopic views.
arXiv Detail & Related papers (2025-09-21T10:40:36Z)
Accelerating 3D Photoacoustic Computed Tomography with End-to-End Physics-Aware Neural Operators [74.65171736966131]
Photoacoustic computed tomography (PACT) combines optical contrast with ultrasonic resolution, achieving deep-tissue imaging beyond the optical diffusion limit.<n>Current implementations require dense transducer arrays and prolonged acquisition times, limiting clinical translation.<n>We introduce Pano, an end-to-end physics-aware model that directly learns the inverse acoustic mapping from sensor measurements to volumetric reconstructions.
arXiv Detail & Related papers (2025-09-11T23:12:55Z)
EndoUFM: Utilizing Foundation Models for Monocular depth estimation of endoscopic images [7.350425834778092]
EndoUFM is an unsupervised monocular depth estimation framework.<n>It enhances the depth estimation performance by leveraging the powerful pre-learned priors.<n>This work contributes to augmenting surgeons' spatial perception during minimally invasive procedures.
arXiv Detail & Related papers (2025-08-25T11:33:05Z)
Landmark-Free Preoperative-to-Intraoperative Registration in Laparoscopic Liver Resection [50.388465935739376]
Liver registration by overlaying preoperative 3D models onto intraoperative 2D frames can assist surgeons in perceiving the spatial anatomy of the liver clearly for a higher surgical success rate.<n>Existing registration methods rely heavily on anatomical landmark-based, which encounter two major limitations.<n>We propose a landmark-free preoperative-to-intraoperative registration framework utilizing effective self-supervised learning.
arXiv Detail & Related papers (2025-04-21T14:55:57Z)
From Real Artifacts to Virtual Reference: A Robust Framework for Translating Endoscopic Images [27.230439605570812]
In endoscopic imaging, combining pre-operative data with intra-operative imaging is important for surgical planning and navigation.<n>Existing domain adaptation methods are hampered by distribution shift caused by in vivo artifacts.<n>This paper presents an artifact-resilient image translation method and an associated benchmark for this purpose.
arXiv Detail & Related papers (2024-10-15T02:41:52Z)
Intraoperative Registration by Cross-Modal Inverse Neural Rendering [61.687068931599846]
We present a novel approach for 3D/2D intraoperative registration during neurosurgery via cross-modal inverse neural rendering. Our approach separates implicit neural representation into two components, handling anatomical structure preoperatively and appearance intraoperatively. We tested our method on retrospective patients' data from clinical cases, showing that our method outperforms state-of-the-art while meeting current clinical standards for registration.
arXiv Detail & Related papers (2024-09-18T13:40:59Z)
CathFlow: Self-Supervised Segmentation of Catheters in Interventional Ultrasound Using Optical Flow and Transformers [66.15847237150909]
We introduce a self-supervised deep learning architecture to segment catheters in longitudinal ultrasound images. The network architecture builds upon AiAReSeg, a segmentation transformer built with the Attention in Attention mechanism. We validated our model on a test dataset, consisting of unseen synthetic data and images collected from silicon aorta phantoms.
arXiv Detail & Related papers (2024-03-21T15:13:36Z)
Sim2Real within 5 Minutes: Efficient Domain Transfer with Stylized Gaussian Splatting for Endoscopic Images [28.802915155343964]
endoluminal intervention is an emerging technique for both benign and malignant luminal lesions.<n>In practice, aligning pre-operative and intra-operative domains is complicated by significant texture differences.<n>This paper proposes an efficient domain transfer method based on stylized Gaussian splatting.
arXiv Detail & Related papers (2024-03-16T08:57:00Z)
Real-time landmark detection for precise endoscopic submucosal dissection via shape-aware relation network [51.44506007844284]
We propose a shape-aware relation network for accurate and real-time landmark detection in endoscopic submucosal dissection surgery. We first devise an algorithm to automatically generate relation keypoint heatmaps, which intuitively represent the prior knowledge of spatial relations among landmarks. We then develop two complementary regularization schemes to progressively incorporate the prior knowledge into the training process.
arXiv Detail & Related papers (2021-11-08T07:57:30Z)
Adversarial Domain Feature Adaptation for Bronchoscopic Depth Estimation [111.89519571205778]
In this work, we propose an alternative domain-adaptive approach to depth estimation. Our novel two-step structure first trains a depth estimation network with labeled synthetic images in a supervised manner. The results of our experiments show that the proposed method improves the network's performance on real images by a considerable margin.
arXiv Detail & Related papers (2021-09-24T08:11:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.