Related papers: EndoControlMag: Robust Endoscopic Vascular Motion Magnification with Periodic Reference Resetting and Hierarchical Tissue-aware Dual-Mask Control

EndoControlMag: Robust Endoscopic Vascular Motion Magnification with Periodic Reference Resetting and Hierarchical Tissue-aware Dual-Mask Control

URL: http://arxiv.org/abs/2507.15292v4
Date: Thu, 24 Jul 2025 13:26:19 GMT
Title: EndoControlMag: Robust Endoscopic Vascular Motion Magnification with Periodic Reference Resetting and Hierarchical Tissue-aware Dual-Mask Control
Authors: An Wang, Rulin Zhou, Mengya Xu, Yiru Ye, Longfei Gou, Yiting Chang, Hao Chen, Chwee Ming Lim, Jiankun Wang, Hongliang Ren,
Abstract summary: We introduce EndoControlMag, a training-free framework with mask-conditioned vascular motion magnification tailored to endoscopic environments.<n>Our approach features two key modules: a Periodic Reference Resetting scheme that divides videos into short overlapping clips with dynamically updated reference frames to prevent error accumulation.<n>We evaluate EndoControlMag on our EndoVMM24 dataset spanning four different surgery types and various challenging scenarios.
Score: 10.426745597034204
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Visualizing subtle vascular motions in endoscopic surgery is crucial for surgical precision and decision-making, yet remains challenging due to the complex and dynamic nature of surgical scenes. To address this, we introduce EndoControlMag, a training-free, Lagrangian-based framework with mask-conditioned vascular motion magnification tailored to endoscopic environments. Our approach features two key modules: a Periodic Reference Resetting (PRR) scheme that divides videos into short overlapping clips with dynamically updated reference frames to prevent error accumulation while maintaining temporal coherence, and a Hierarchical Tissue-aware Magnification (HTM) framework with dual-mode mask dilation. HTM first tracks vessel cores using a pretrained visual tracking model to maintain accurate localization despite occlusions and view changes. It then applies one of two adaptive softening strategies to surrounding tissues: motion-based softening that modulates magnification strength proportional to observed tissue displacement, or distance-based exponential decay that simulates biomechanical force attenuation. This dual-mode approach accommodates diverse surgical scenarios-motion-based softening excels with complex tissue deformations while distance-based softening provides stability during unreliable optical flow conditions. We evaluate EndoControlMag on our EndoVMM24 dataset spanning four different surgery types and various challenging scenarios, including occlusions, instrument disturbance, view changes, and vessel deformations. Quantitative metrics, visual assessments, and expert surgeon evaluations demonstrate that EndoControlMag significantly outperforms existing methods in both magnification accuracy and visual quality while maintaining robustness across challenging surgical conditions. The code, dataset, and video results are available at https://szupc.github.io/EndoControlMag/.

Related papers

Dual-Encoder Transformer-Based Multimodal Learning for Ischemic Stroke Lesion Segmentation Using Diffusion MRI [5.332404648315838]
We study ischemic stroke lesion segmentation using multimodal diffusion MRI from the ISLES 2022 dataset.<n>Several state-of-the-art convolutional and transformer-based architectures, including U-Net variants, Swin-UNet, and TransUNet, are benchmarked.<n>Results show that transformer-based models outperform convolutional baselines, and the proposed dual-encoder TransUNet achieves the best performance, reaching a Dice score of 85.4% on the test set.
arXiv Detail & Related papers (2025-12-23T15:24:31Z)
MM-UNet: Morph Mamba U-shaped Convolutional Networks for Retinal Vessel Segmentation [21.90972169495466]
MM-UNet is a novel architecture tailored for efficient retinal vessel segmentation.<n>It incorporates Morph Mamba Convolution layers, which replace pointwise convolutions to enhance branching topological perception.<n>It achieves F1-score gains of 1.64 % on DRIVE and 1.25 % on STARE, demonstrating its effectiveness and advancement.
arXiv Detail & Related papers (2025-11-04T02:18:25Z)
MIORe & VAR-MIORe: Benchmarks to Push the Boundaries of Restoration [53.180212987726556]
We introduce MIORe and VAR-MIORe, two novel multi-task datasets that address critical limitations in current motion restoration benchmarks.<n>Our datasets capture a broad spectrum of motion scenarios, which include complex ego-camera movements, dynamic multi-subject interactions, and depth-dependent blur effects.
arXiv Detail & Related papers (2025-09-08T15:34:31Z)
EndoVLA: Dual-Phase Vision-Language-Action Model for Autonomous Tracking in Endoscopy [26.132684811981143]
Vision-Language-Action (VLA) models integrate visual perception, language grounding, and motion planning within an end-to-end framework.<n>EndoVLA performs three core tasks: (1) polyp tracking, (2) delineation and following of abnormal mucosal regions, and (3) adherence to circular markers during circumferential cutting.
arXiv Detail & Related papers (2025-05-21T07:35:00Z)
X$^{2}$-Gaussian: 4D Radiative Gaussian Splatting for Continuous-time Tomographic Reconstruction [45.31051025401413]
X$2$-Gaussian is a novel framework for continuous-time 4DCT reconstruction.<n>It integrates dynamic radiative splatting with self-supervised respiratory motion learning.<n>It achieves a 9.93 dB PSNR gain over traditional methods and 2.25 dB improvement against prior splatting techniques.
arXiv Detail & Related papers (2025-03-27T17:59:57Z)
SurgSora: Object-Aware Diffusion Model for Controllable Surgical Video Generation [25.963369099780113]
SurgSora is a framework that generates high-fidelity, motion-controllable surgical videos from a single input frame and user-specified motion cues.<n>By conditioning these enriched features within the Stable Video Diffusion, SurgSora achieves state-of-the-art visual authenticity and controllability.
arXiv Detail & Related papers (2024-12-18T16:34:51Z)
Serp-Mamba: Advancing High-Resolution Retinal Vessel Segmentation with Selective State-Space Model [45.682311387979944]
We propose the first Serpentine Mamba (Serp-Mamba) network to address this challenging task. We first devise a Serpentine Interwoven Adaptive (SIA) scan mechanism, which scans UWF-SLO images along curved vessel structures in a snake-like crawling manner. Second, we propose an Ambiguity-Driven Dual Recalibration module to address the category imbalance problem intensified by high-resolution images.
arXiv Detail & Related papers (2024-09-06T15:40:47Z)
PhysMamba: State Space Duality Model for Remote Physiological Measurement [18.423806804725032]
Remote Photoplethysmography (rBFC) enables non-contact physiological signal extraction from facial videos.<n>This work lays a strong foundation for practical applications in non-contact health monitoring, including real-time remote patient care.
arXiv Detail & Related papers (2024-08-02T07:52:28Z)
Efficient Multi-View Fusion and Flexible Adaptation to View Missing in Cardiovascular System Signals [4.519437028632205]
Deep learning has facilitated automatic multi-view fusion (MVF) about the cardiovascular system (CVS) signals. MVF model architecture often amalgamates CVS signals from the same temporal step but different views into a unified representation. We introduce prompt techniques to aid pretrained MVF models in flexibly adapting to various missing-view scenarios.
arXiv Detail & Related papers (2024-06-13T08:58:59Z)
Real-time guidewire tracking and segmentation in intraoperative x-ray [52.51797358201872]
We propose a two-stage deep learning framework for real-time guidewire segmentation and tracking. In the first stage, a Yolov5 detector is trained, using the original X-ray images as well as synthetic ones, to output the bounding boxes of possible target guidewires. In the second stage, a novel and efficient network is proposed to segment the guidewire in each detected bounding box.
arXiv Detail & Related papers (2024-04-12T20:39:19Z)
FLex: Joint Pose and Dynamic Radiance Fields Optimization for Stereo Endoscopic Videos [79.50191812646125]
Reconstruction of endoscopic scenes is an important asset for various medical applications, from post-surgery analysis to educational training. We adress the challenging setup of a moving endoscope within a highly dynamic environment of deforming tissue. We propose an implicit scene separation into multiple overlapping 4D neural radiance fields (NeRFs) and a progressive optimization scheme jointly optimizing for reconstruction and camera poses from scratch. This improves the ease-of-use and allows to scale reconstruction capabilities in time to process surgical videos of 5,000 frames and more; an improvement of more than ten times compared to the state of the art while being agnostic to external tracking information
arXiv Detail & Related papers (2024-03-18T19:13:02Z)
Deep Cardiac MRI Reconstruction with ADMM [7.694990352622926]
We present a deep learning (DL)-based method for accelerated cine and multi-contrast reconstruction in the context of cardiac imaging. Our method optimize in both the image and k-space domains, allowing for high reconstruction fidelity.
arXiv Detail & Related papers (2023-10-10T13:46:11Z)
On Sensitivity and Robustness of Normalization Schemes to Input Distribution Shifts in Automatic MR Image Diagnosis [58.634791552376235]
Deep Learning (DL) models have achieved state-of-the-art performance in diagnosing multiple diseases using reconstructed images as input. DL models are sensitive to varying artifacts as it leads to changes in the input data distribution between the training and testing phases. We propose to use other normalization techniques, such as Group Normalization and Layer Normalization, to inject robustness into model performance against varying image artifacts.
arXiv Detail & Related papers (2023-06-23T03:09:03Z)
Affinity Feature Strengthening for Accurate, Complete and Robust Vessel Segmentation [48.638327652506284]
Vessel segmentation is crucial in many medical image applications, such as detecting coronary stenoses, retinal vessel diseases and brain aneurysms. We present a novel approach, the affinity feature strengthening network (AFN), which jointly models geometry and refines pixel-wise segmentation features using a contrast-insensitive, multiscale affinity approach.
arXiv Detail & Related papers (2022-11-12T05:39:17Z)
FetReg: Placental Vessel Segmentation and Registration in Fetoscopy Challenge Dataset [57.30136148318641]
Fetoscopy laser photocoagulation is a widely used procedure for the treatment of Twin-to-Twin Transfusion Syndrome (TTTS) This may lead to increased procedural time and incomplete ablation, resulting in persistent TTTS. Computer-assisted intervention may help overcome these challenges by expanding the fetoscopic field of view through video mosaicking and providing better visualization of the vessel network. We present a large-scale multi-centre dataset for the development of generalized and robust semantic segmentation and video mosaicking algorithms for the fetal environment with a focus on creating drift-free mosaics from long duration fetoscopy videos.
arXiv Detail & Related papers (2021-06-10T17:14:27Z)
Dueling Deep Q-Network for Unsupervised Inter-frame Eye Movement Correction in Optical Coherence Tomography Volumes [5.371290280449071]
In optical coherence tomography ( OCT) volumes of retina, the sequential acquisition of the individual slices makes this modality prone to motion artifacts. Speckle noise that is characteristic of this imaging modality, leads to inaccuracies when traditional registration techniques are employed. In this paper, we tackle these issues by using deep reinforcement learning to correct inter-frame movements in an unsupervised manner.
arXiv Detail & Related papers (2020-07-03T07:14:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.