Related papers: Real-time Surgical Environment Enhancement for Robot-Assisted Minimally Invasive Surgery Based on Super-Resolution

Related papers

Strategy-Supervised Autonomous Laparoscopic Camera Control via Event-Driven Graph Mining [15.995867664955348]
We present a strategy-grounded framework that couples high-level vision-language inference with low-level closed-loop control.<n> offline, raw surgical videos are parsed into camera-relevant temporal events and structured as attributed event graphs.<n>Online, a fine-tuned Vision-Language Model (VLM) processes the live laparoscopic view to predict the dominant strategy and discrete image-based motion commands.
arXiv Detail & Related papers (2026-02-24T02:56:39Z)
GeoEyes: On-Demand Visual Focusing for Evidence-Grounded Understanding of Ultra-High-Resolution Remote Sensing Imagery [69.05066425853326]
"thinking-with-images" paradigm enables multimodal large language models (MLLMs) to actively explore visual scenes via zoom-in tools.<n>This is essential for ultra-high-resolution (UHR) remote sensing VQA, where task-relevant cues are sparse and tiny.<n>We propose GeoEyes, a training framework consisting of (1) a cold-start SFT dataset, UHR Chain-of-Zoom (UHR-CoZ), which covers diverse zooming regimes, and (2) an agentic reinforcement learning method, AdaZoom-GRPO, that explicitly rewards evidence gain and answer improvement during zoom
arXiv Detail & Related papers (2026-02-15T15:50:55Z)
ZoomEarth: Active Perception for Ultra-High-Resolution Geospatial Vision-Language Tasks [49.99788276124186]
Existing dynamic resolution and token pruning methods are constrained by a passive perception paradigm.<n>We present LRS-GRO, a large-scale benchmark dataset tailored for active perception in UHR RS processing.<n>We propose ZoomEarth, an adaptive cropping-zooming framework with a novel Region-Guided reward that provides fine-grained guidance.
arXiv Detail & Related papers (2025-11-15T15:47:46Z)
Subsampled Randomized Fourier GaLore for Adapting Foundation Models in Depth-Driven Liver Landmark Segmentation [6.91206648866302]
We propose a depth-guided liver landmark segmentation framework integrating semantic and geometric cues via vision foundation encoders.<n>To efficiently adapt SAM2, we introduce SRFT-GaLore, a novel low-rank gradient projection method that replaces the computationally expensive SVD with a Subsampled Randomized Fourier Transform.<n>Our method achieves a 4.85% improvement in Dice Similarity Coefficient and a 11.78-point reduction in Average Symmetric Surface Distance compared to the D2GPLand.
arXiv Detail & Related papers (2025-11-05T04:16:49Z)
MedVSR: Medical Video Super-Resolution with Cross State-Space Propagation [63.38824041721275]
Low-resolution (LR) medical videos present unique challenges for video super-resolution (VSR) models.<n>We propose MedVSR, a tailored framework for medical VSR.<n>We show that MedVSR significantly outperforms existing VSR models in reconstruction performance and efficiency.
arXiv Detail & Related papers (2025-09-25T14:56:59Z)
BCRNet: Enhancing Landmark Detection in Laparoscopic Liver Surgery via Bezier Curve Refinement [14.918845671238737]
BCRNet is a novel framework that significantly enhances landmark detection in laparoscopic liver surgery.<n>The framework starts with a Multi-modal Feature Extraction (MFE) module designed to robustly capture semantic features.<n>BCRNet outperforms state-of-the-art methods, achieving significant performance improvements.
arXiv Detail & Related papers (2025-06-18T09:00:08Z)
Surgical Foundation Model Leveraging Compression and Entropy Maximization for Image-Guided Surgical Assistance [50.486523249499115]
Real-time video understanding is critical to guide procedures in minimally invasive surgery (MIS)<n>We propose Compress-to-Explore (C2E), a novel self-supervised framework to learn compact, informative representations from surgical videos.<n>C2E uses entropy-maximizing decoders to compress images while preserving clinically relevant details, improving encoder performance without labeled data.
arXiv Detail & Related papers (2025-05-16T14:02:24Z)
Seamless Augmented Reality Integration in Arthroscopy: A Pipeline for Articular Reconstruction and Guidance [4.8046407905667206]
arthroscopy is a minimally invasive surgical procedure used to diagnose and treat joint problems. The arthroscope's restricted field of view and lack of depth perception pose challenges in navigating complex articular structures. We present a robust pipeline that incorporates simultaneous localization and mapping, depth estimation, and 3D Gaussian splatting. Our solution offers AR assistance for articular notch measurement and annotation anchoring in a human-in-the-loop manner.
arXiv Detail & Related papers (2024-10-01T04:15:49Z)
Deep intra-operative illumination calibration of hyperspectral cameras [73.08443963791343]
Hyperspectral imaging (HSI) is emerging as a promising novel imaging modality with various potential surgical applications. We show that dynamically changing lighting conditions in the operating room dramatically affect the performance of HSI applications. We propose a novel learning-based approach to automatically recalibrating hyperspectral images during surgery.
arXiv Detail & Related papers (2024-09-11T08:30:03Z)
FLex: Joint Pose and Dynamic Radiance Fields Optimization for Stereo Endoscopic Videos [79.50191812646125]
Reconstruction of endoscopic scenes is an important asset for various medical applications, from post-surgery analysis to educational training. We adress the challenging setup of a moving endoscope within a highly dynamic environment of deforming tissue. We propose an implicit scene separation into multiple overlapping 4D neural radiance fields (NeRFs) and a progressive optimization scheme jointly optimizing for reconstruction and camera poses from scratch. This improves the ease-of-use and allows to scale reconstruction capabilities in time to process surgical videos of 5,000 frames and more; an improvement of more than ten times compared to the state of the art while being agnostic to external tracking information
arXiv Detail & Related papers (2024-03-18T19:13:02Z)
Monocular Microscope to CT Registration using Pose Estimation of the Incus for Augmented Reality Cochlear Implant Surgery [3.8909273404657556]
We develop a method that permits direct 2D-to-3D registration of the view microscope video to the pre-operative Computed Tomography (CT) scan without the need for external tracking equipment. Our results demonstrate the accuracy with an average rotation error of less than 25 degrees and a translation error of less than 2 mm, 3 mm, and 0.55% for the x, y, and z axes, respectively.
arXiv Detail & Related papers (2024-03-12T00:26:08Z)
Phase-Specific Augmented Reality Guidance for Microscopic Cataract Surgery Using Long-Short Spatiotemporal Aggregation Transformer [14.568834378003707]
Phaemulsification cataract surgery (PCS) is a routine procedure using a surgical microscope. PCS guidance systems extract valuable information from surgical microscopic videos to enhance proficiency. Existing PCS guidance systems suffer from non-phasespecific guidance, leading to redundant visual information. We propose a novel phase-specific augmented reality (AR) guidance system, which offers tailored AR information corresponding to the recognized surgical phase.
arXiv Detail & Related papers (2023-09-11T02:56:56Z)
Neural LerPlane Representations for Fast 4D Reconstruction of Deformable Tissues [52.886545681833596]
LerPlane is a novel method for fast and accurate reconstruction of surgical scenes under a single-viewpoint setting. LerPlane treats surgical procedures as 4D volumes and factorizes them into explicit 2D planes of static and dynamic fields. LerPlane shares static fields, significantly reducing the workload of dynamic tissue modeling.
arXiv Detail & Related papers (2023-05-31T14:38:35Z)
LoViT: Long Video Transformer for Surgical Phase Recognition [59.06812739441785]
We present a two-stage method, called Long Video Transformer (LoViT) for fusing short- and long-term temporal information. Our approach outperforms state-of-the-art methods on the Cholec80 and AutoLaparo datasets consistently.
arXiv Detail & Related papers (2023-05-15T20:06:14Z)
Monitoring MBE substrate deoxidation via RHEED image-sequence analysis by deep learning [62.997667081978825]
We present an approach for automated surveillance of GaAs substrate deoxidation in MBE using deep learning based RHEED image-sequence classification. Our approach consists of an non-supervised auto-encoder (AE) for feature extraction, combined with a supervised convolutional network.
arXiv Detail & Related papers (2022-10-07T10:01:06Z)
Synthetic and Real Inputs for Tool Segmentation in Robotic Surgery [10.562627972607892]
We show that it may be possible to use robot kinematic data coupled with laparoscopic images to alleviate the labelling problem. We propose a new deep learning based model for parallel processing of both laparoscopic and simulation images.
arXiv Detail & Related papers (2020-07-17T16:33:33Z)
Searching for Efficient Architecture for Instrument Segmentation in Robotic Surgery [58.63306322525082]
Most applications rely on accurate real-time segmentation of high-resolution surgical images. We design a light-weight and highly-efficient deep residual architecture which is tuned to perform real-time inference of high-resolution images.
arXiv Detail & Related papers (2020-07-08T21:38:29Z)
Towards Better Surgical Instrument Segmentation in Endoscopic Vision: Multi-Angle Feature Aggregation and Contour Supervision [22.253074722129053]
We propose a general embeddable approach to improve current deep neural network (DNN) segmentation models. The proposed method is validated with ablation experiments on the novel Sinus-Surgery datasets collected from surgeons' operations.
arXiv Detail & Related papers (2020-02-25T05:28:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.