Subsampled Randomized Fourier GaLore for Adapting Foundation Models in Depth-Driven Liver Landmark Segmentation
- URL: http://arxiv.org/abs/2511.03163v1
- Date: Wed, 05 Nov 2025 04:16:49 GMT
- Title: Subsampled Randomized Fourier GaLore for Adapting Foundation Models in Depth-Driven Liver Landmark Segmentation
- Authors: Yun-Chen Lin, Jiayuan Huang, Hanyuan Zhang, Sergi Kavtaradze, Matthew J. Clarkson, Mobarak I. Hoque,
- Abstract summary: We propose a depth-guided liver landmark segmentation framework integrating semantic and geometric cues via vision foundation encoders.<n>To efficiently adapt SAM2, we introduce SRFT-GaLore, a novel low-rank gradient projection method that replaces the computationally expensive SVD with a Subsampled Randomized Fourier Transform.<n>Our method achieves a 4.85% improvement in Dice Similarity Coefficient and a 11.78-point reduction in Average Symmetric Surface Distance compared to the D2GPLand.
- Score: 6.91206648866302
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate detection and delineation of anatomical structures in medical imaging are critical for computer-assisted interventions, particularly in laparoscopic liver surgery where 2D video streams limit depth perception and complicate landmark localization. While recent works have leveraged monocular depth cues for enhanced landmark detection, challenges remain in fusing RGB and depth features and in efficiently adapting large-scale vision models to surgical domains. We propose a depth-guided liver landmark segmentation framework integrating semantic and geometric cues via vision foundation encoders. We employ Segment Anything Model V2 (SAM2) encoder to extract RGB features and Depth Anything V2 (DA2) encoder to extract depth-aware features. To efficiently adapt SAM2, we introduce SRFT-GaLore, a novel low-rank gradient projection method that replaces the computationally expensive SVD with a Subsampled Randomized Fourier Transform (SRFT). This enables efficient fine-tuning of high-dimensional attention layers without sacrificing representational power. A cross-attention fusion module further integrates RGB and depth cues. To assess cross-dataset generalization, we also construct a new Laparoscopic Liver Surgical Dataset (LLSD) as an external validation benchmark. On the public L3D dataset, our method achieves a 4.85% improvement in Dice Similarity Coefficient and a 11.78-point reduction in Average Symmetric Surface Distance compared to the D2GPLand. To further assess generalization capability, we evaluate our model on LLSD dataset. Our model maintains competitive performance and significantly outperforms SAM-based baselines, demonstrating strong cross-dataset robustness and adaptability to unseen surgical environments. These results demonstrate that our SRFT-GaLore-enhanced dual-encoder framework enables scalable and precise segmentation under real-time, depth-constrained surgical settings.
Related papers
- Preoperative-to-intraoperative Liver Registration for Laparoscopic Surgery via Latent-Grounded Correspondence Constraints [51.7011449975586]
Land-Reg is a deformable registration framework that learns latent-grounded 2D-3D landmark correspondences.<n>For rigid registration, Land-Reg embraces a Cross-modal Latent Alignment module.<n>An Uncertainty-enhanced Overlap Landmark Detector with similarity matching is proposed to robustly estimate explicit 2D-3D landmark correspondences.
arXiv Detail & Related papers (2026-03-02T10:44:03Z) - MM-UNet: Morph Mamba U-shaped Convolutional Networks for Retinal Vessel Segmentation [21.90972169495466]
MM-UNet is a novel architecture tailored for efficient retinal vessel segmentation.<n>It incorporates Morph Mamba Convolution layers, which replace pointwise convolutions to enhance branching topological perception.<n>It achieves F1-score gains of 1.64 $%$ on DRIVE and 1.25 $%$ on STARE, demonstrating its effectiveness and advancement.
arXiv Detail & Related papers (2025-11-04T02:18:25Z) - A Dual-Feature Extractor Framework for Accurate Back Depth and Spine Morphology Estimation from Monocular RGB Images [15.19284295210246]
We propose a novel pipeline to accurately estimate the depth information of the unclothed back.<n>We then estimate spine morphology by integrating both depth and surface information.<n>This integrated approach enhances the accuracy of spine curve generation, achieving an impressive performance of up to 97%.
arXiv Detail & Related papers (2025-07-30T13:55:37Z) - Topology-Constrained Learning for Efficient Laparoscopic Liver Landmark Detection [46.2391319253146]
Liver landmarks provide crucial anatomical guidance to the surgeon during laparoscopic liver surgery.<n>TopoNet is a novel topology-constrained learning framework for laparoscopic liver landmark detection.<n>Our framework adopts a snake-CNN dual-path encoder to simultaneously capture detailed RGB texture information and depth-informed topological structures.
arXiv Detail & Related papers (2025-07-01T07:35:36Z) - Lightweight RGB-D Salient Object Detection from a Speed-Accuracy Tradeoff Perspective [54.91271106816616]
Current RGB-D methods usually leverage large-scale backbones to improve accuracy but sacrifice efficiency.<n>We propose a Speed-Accuracy Tradeoff Network (SATNet) for Lightweight RGB-D SOD from three fundamental perspectives.<n> Concerning depth quality, we introduce the Depth Anything Model to generate high-quality depth maps.<n>For modality fusion, we propose a Decoupled Attention Module (DAM) to explore the consistency within and between modalities.<n>For feature representation, we develop a Dual Information Representation Module (DIRM) with a bi-directional inverted framework.
arXiv Detail & Related papers (2025-05-07T19:37:20Z) - Prototype Learning Guided Hybrid Network for Breast Tumor Segmentation in DCE-MRI [58.809276442508256]
We propose a hybrid network via the combination of convolution neural network (CNN) and transformer layers.
The experimental results on private and public DCE-MRI datasets demonstrate that the proposed hybrid network superior performance than the state-of-the-art methods.
arXiv Detail & Related papers (2024-08-11T15:46:00Z) - Depth-Driven Geometric Prompt Learning for Laparoscopic Liver Landmark Detection [43.600236988802465]
Liver anatomical landmarks serve as important markers for 2D-3D alignment.
To facilitate the detection of laparoscopic liver landmarks, we collect a novel dataset called L3D.
We propose a depth-driven geometric prompt learning network, namely D2GPLand.
arXiv Detail & Related papers (2024-06-25T18:02:11Z) - SDR-Former: A Siamese Dual-Resolution Transformer for Liver Lesion
Classification Using 3D Multi-Phase Imaging [59.78761085714715]
This study proposes a novel Siamese Dual-Resolution Transformer (SDR-Former) framework for liver lesion classification.
The proposed framework has been validated through comprehensive experiments on two clinical datasets.
To support the scientific community, we are releasing our extensive multi-phase MR dataset for liver lesion analysis to the public.
arXiv Detail & Related papers (2024-02-27T06:32:56Z) - Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Medical Image Segmentation [49.57907601086494]
Medical image segmentation plays a crucial role in computer-aided diagnosis.
We propose a novel Dual-scale Enhanced and Cross-generative consistency learning framework for semi-supervised medical image (DEC-Seg)
arXiv Detail & Related papers (2023-12-26T12:56:31Z) - 3DSAM-adapter: Holistic adaptation of SAM from 2D to 3D for promptable tumor segmentation [52.699139151447945]
We propose a novel adaptation method for transferring the segment anything model (SAM) from 2D to 3D for promptable medical image segmentation.
Our model can outperform domain state-of-the-art medical image segmentation models on 3 out of 4 tasks, specifically by 8.25%, 29.87%, and 10.11% for kidney tumor, pancreas tumor, colon cancer segmentation, and achieve similar performance for liver tumor segmentation.
arXiv Detail & Related papers (2023-06-23T12:09:52Z) - Dual Swin-Transformer based Mutual Interactive Network for RGB-D Salient
Object Detection [67.33924278729903]
In this work, we propose Dual Swin-Transformer based Mutual Interactive Network.
We adopt Swin-Transformer as the feature extractor for both RGB and depth modality to model the long-range dependencies in visual inputs.
Comprehensive experiments on five standard RGB-D SOD benchmark datasets demonstrate the superiority of the proposed DTMINet method.
arXiv Detail & Related papers (2022-06-07T08:35:41Z) - PAENet: A Progressive Attention-Enhanced Network for 3D to 2D Retinal
Vessel Segmentation [0.0]
3D to 2D retinal vessel segmentation is a challenging problem in Optical Coherence Tomography Angiography ( OCTA) images.
We propose a Progressive Attention-Enhanced Network (PAENet) based on attention mechanisms to extract rich feature representation.
Our proposed algorithm achieves state-of-the-art performance compared with previous methods.
arXiv Detail & Related papers (2021-08-26T10:27:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.