Geo-RepNet: Geometry-Aware Representation Learning for Surgical Phase Recognition in Endoscopic Submucosal Dissection
- URL: http://arxiv.org/abs/2507.09294v1
- Date: Sat, 12 Jul 2025 14:07:44 GMT
- Title: Geo-RepNet: Geometry-Aware Representation Learning for Surgical Phase Recognition in Endoscopic Submucosal Dissection
- Authors: Rui Tang, Haochen Yin, Guankun Wang, Long Bai, An Wang, Huxin Gao, Jiazheng Wang, Hongliang Ren,
- Abstract summary: Geo-RepNet is a geometry-aware convolutional framework that integrates RGB image and depth information to enhance recognition performance in complex surgical scenes.<n>To evaluate the effectiveness of our approach, we construct a nine-phase ESD dataset with dense frame-level annotations from real-world ESD videos.
- Score: 10.386536115270294
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Surgical phase recognition plays a critical role in developing intelligent assistance systems for minimally invasive procedures such as Endoscopic Submucosal Dissection (ESD). However, the high visual similarity across different phases and the lack of structural cues in RGB images pose significant challenges. Depth information offers valuable geometric cues that can complement appearance features by providing insights into spatial relationships and anatomical structures. In this paper, we pioneer the use of depth information for surgical phase recognition and propose Geo-RepNet, a geometry-aware convolutional framework that integrates RGB image and depth information to enhance recognition performance in complex surgical scenes. Built upon a re-parameterizable RepVGG backbone, Geo-RepNet incorporates the Depth-Guided Geometric Prior Generation (DGPG) module that extracts geometry priors from raw depth maps, and the Geometry-Enhanced Multi-scale Attention (GEMA) to inject spatial guidance through geometry-aware cross-attention and efficient multi-scale aggregation. To evaluate the effectiveness of our approach, we construct a nine-phase ESD dataset with dense frame-level annotations from real-world ESD videos. Extensive experiments on the proposed dataset demonstrate that Geo-RepNet achieves state-of-the-art performance while maintaining robustness and high computational efficiency under complex and low-texture surgical environments.
Related papers
- Geometry-Editable and Appearance-Preserving Object Compositon [67.98806888489385]
General object composition (GOC) aims to seamlessly integrate a target object into a background scene with desired geometric properties.<n>Recent approaches derive semantic embeddings and integrate them into advanced diffusion models to enable geometry-editable generation.<n>We introduce a Disentangled Geometry-editable and Appearance-preserving Diffusion model that first leverages semantic embeddings to implicitly capture desired geometric transformations.
arXiv Detail & Related papers (2025-05-27T09:05:28Z) - Scribble-Based Interactive Segmentation of Medical Hyperspectral Images [4.675955891956077]
This work introduces a scribble-based interactive segmentation framework for medical hyperspectral images.
The proposed method utilizes deep learning for feature extraction and a geodesic distance map generated from user-provided scribbles.
arXiv Detail & Related papers (2024-08-05T12:33:07Z) - Depth-Driven Geometric Prompt Learning for Laparoscopic Liver Landmark Detection [43.600236988802465]
Liver anatomical landmarks serve as important markers for 2D-3D alignment.
To facilitate the detection of laparoscopic liver landmarks, we collect a novel dataset called L3D.
We propose a depth-driven geometric prompt learning network, namely D2GPLand.
arXiv Detail & Related papers (2024-06-25T18:02:11Z) - Semantic segmentation of surgical hyperspectral images under geometric
domain shifts [69.91792194237212]
We present the first analysis of state-of-the-art semantic segmentation networks in the presence of geometric out-of-distribution (OOD) data.
We also address generalizability with a dedicated augmentation technique termed "Organ Transplantation"
Our scheme improves on the SOA DSC by up to 67 % (RGB) and 90 % (HSI) and renders performance on par with in-distribution performance on real OOD test data.
arXiv Detail & Related papers (2023-03-20T09:50:07Z) - Semantic-SuPer: A Semantic-aware Surgical Perception Framework for
Endoscopic Tissue Classification, Reconstruction, and Tracking [21.133420628173067]
We present a novel surgical perception framework, Semantic-SuPer.
It integrates geometric and semantic information to facilitate data association, 3D reconstruction, and tracking of endoscopic scenes.
arXiv Detail & Related papers (2022-10-29T19:33:21Z) - Recurrent Feature Propagation and Edge Skip-Connections for Automatic
Abdominal Organ Segmentation [13.544665065396373]
We propose a 3D network with four main components trained end-to-end including encoder, edge detector, decoder with edge skip-connections and recurrent feature propagation head.
Experimental results show that the proposed network outperforms several state-of-the-art models.
arXiv Detail & Related papers (2022-01-02T08:33:19Z) - Deep Unrolled Recovery in Sparse Biological Imaging [62.997667081978825]
Deep algorithm unrolling is a model-based approach to develop deep architectures that combine the interpretability of iterative algorithms with the performance gains of supervised deep learning.
This framework is well-suited to applications in biological imaging, where physics-based models exist to describe the measurement process and the information to be recovered is often highly structured.
arXiv Detail & Related papers (2021-09-28T20:22:44Z) - High-resolution Depth Maps Imaging via Attention-based Hierarchical
Multi-modal Fusion [84.24973877109181]
We propose a novel attention-based hierarchical multi-modal fusion network for guided DSR.
We show that our approach outperforms state-of-the-art methods in terms of reconstruction accuracy, running speed and memory efficiency.
arXiv Detail & Related papers (2021-04-04T03:28:33Z) - Light Field Reconstruction Using Convolutional Network on EPI and
Extended Applications [78.63280020581662]
A novel convolutional neural network (CNN)-based framework is developed for light field reconstruction from a sparse set of views.
We demonstrate the high performance and robustness of the proposed framework compared with state-of-the-art algorithms.
arXiv Detail & Related papers (2021-03-24T08:16:32Z) - Pathological Retinal Region Segmentation From OCT Images Using Geometric
Relation Based Augmentation [84.7571086566595]
We propose improvements over previous GAN-based medical image synthesis methods by jointly encoding the intrinsic relationship of geometry and shape.
The proposed method outperforms state-of-the-art segmentation methods on the public RETOUCH dataset having images captured from different acquisition procedures.
arXiv Detail & Related papers (2020-03-31T11:50:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.