Related papers: SurgiSR4K: A High-Resolution Endoscopic Video Dataset for Robotic-Assisted Minimally Invasive Procedures

SurgiSR4K: A High-Resolution Endoscopic Video Dataset for Robotic-Assisted Minimally Invasive Procedures

URL: http://arxiv.org/abs/2507.00209v2
Date: Mon, 07 Jul 2025 21:09:20 GMT
Title: SurgiSR4K: A High-Resolution Endoscopic Video Dataset for Robotic-Assisted Minimally Invasive Procedures
Authors: Fengyi Jiang, Xiaorui Zhang, Lingbo Jin, Ruixing Liang, Yuxin Chen, Adi Chola Venkatesh, Jason Culman, Tiantian Wu, Lirong Shao, Wenqing Sun, Cong Gao, Hallie McNamara, Jingpei Lu, Omid Mohareri,
Abstract summary: SurgiSR4K is the first publicly accessible surgical imaging and video dataset captured at a native 4K resolution.<n>This dataset opens up possibilities for a broad range of computer vision tasks that might benefit from high resolution data.
Score: 11.016055846317293
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: High-resolution imaging is crucial for enhancing visual clarity and enabling precise computer-assisted guidance in minimally invasive surgery (MIS). Despite the increasing adoption of 4K endoscopic systems, there remains a significant gap in publicly available native 4K datasets tailored specifically for robotic-assisted MIS. We introduce SurgiSR4K, the first publicly accessible surgical imaging and video dataset captured at a native 4K resolution, representing realistic conditions of robotic-assisted procedures. SurgiSR4K comprises diverse visual scenarios including specular reflections, tool occlusions, bleeding, and soft tissue deformations, meticulously designed to reflect common challenges faced during laparoscopic and robotic surgeries. This dataset opens up possibilities for a broad range of computer vision tasks that might benefit from high resolution data, such as super resolution (SR), smoke removal, surgical instrument detection, 3D tissue reconstruction, monocular depth estimation, instance segmentation, novel view synthesis, and vision-language model (VLM) development. SurgiSR4K provides a robust foundation for advancing research in high-resolution surgical imaging and fosters the development of intelligent imaging technologies aimed at enhancing performance, safety, and usability in image-guided robotic surgeries.

Related papers

4KAgent: Agentic Any Image to 4K Super-Resolution [62.99433518118836]
We present 4KAgent, a super-resolution generalist system designed to upscale any image to 4K resolution (and even higher, if applied iteratively)<n>4KAgent comprises three core components: (1) Profiling, a module that customizes the 4KAgent pipeline based on bespoke use cases; (2) A Perception Agent, which leverages vision-language models alongside image quality assessment experts to analyze the input image and make a tailored restoration plan; and (3) A Restoration Agent, which executes the plan, following a quality-driven mixture-of-expert policy to select the optimal output for each step.<n>We rigorously evaluate our 4KAgent
arXiv Detail & Related papers (2025-07-09T17:59:19Z)
A Unified Model for Compressed Sensing MRI Across Undersampling Patterns [69.19631302047569]
We propose a unified MRI reconstruction model robust to various measurement undersampling patterns and image resolutions.<n>Our model improves SSIM by 11% and PSNR by 4 dB over a state-of-the-art CNN (End-to-End VarNet) with 600$times$ faster inference than diffusion methods.
arXiv Detail & Related papers (2024-10-05T20:03:57Z)
A Review of 3D Reconstruction Techniques for Deformable Tissues in Robotic Surgery [8.909938295090827]
NeRF-based techniques have recently garnered attention for the ability to reconstruct scenes implicitly. On the other hand, 3D-GS represents scenes explicitly using 3D Gaussians and projects them onto a 2D plane as a replacement for the complex volume rendering in NeRF. This work explores and reviews state-of-the-art (SOTA) approaches, discussing their innovations and implementation principles.
arXiv Detail & Related papers (2024-08-08T12:51:23Z)
Realistic Data Generation for 6D Pose Estimation of Surgical Instruments [4.226502078427161]
6D pose estimation of surgical instruments is critical to enable the automatic execution of surgical maneuvers. In household and industrial settings, synthetic data, generated with 3D computer graphics software, has been shown as an alternative to minimize annotation costs. We propose an improved simulation environment for surgical robotics that enables the automatic generation of large and diverse datasets.
arXiv Detail & Related papers (2024-06-11T14:59:29Z)
Brain3D: Generating 3D Objects from fMRI [76.41771117405973]
We design a novel 3D object representation learning method, Brain3D, that takes as input the fMRI data of a subject. We show that our model captures the distinct functionalities of each region of human vision system. Preliminary evaluations indicate that Brain3D can successfully identify the disordered brain regions in simulated scenarios.
arXiv Detail & Related papers (2024-05-24T06:06:11Z)
Creating a Digital Twin of Spinal Surgery: A Proof of Concept [68.37190859183663]
Surgery digitalization is the process of creating a virtual replica of real-world surgery. We present a proof of concept (PoC) for surgery digitalization that is applied to an ex-vivo spinal surgery. We employ five RGB-D cameras for dynamic 3D reconstruction of the surgeon, a high-end camera for 3D reconstruction of the anatomy, an infrared stereo camera for surgical instrument tracking, and a laser scanner for 3D reconstruction of the operating room and data fusion.
arXiv Detail & Related papers (2024-03-25T13:09:40Z)
FLex: Joint Pose and Dynamic Radiance Fields Optimization for Stereo Endoscopic Videos [79.50191812646125]
Reconstruction of endoscopic scenes is an important asset for various medical applications, from post-surgery analysis to educational training. We adress the challenging setup of a moving endoscope within a highly dynamic environment of deforming tissue. We propose an implicit scene separation into multiple overlapping 4D neural radiance fields (NeRFs) and a progressive optimization scheme jointly optimizing for reconstruction and camera poses from scratch. This improves the ease-of-use and allows to scale reconstruction capabilities in time to process surgical videos of 5,000 frames and more; an improvement of more than ten times compared to the state of the art while being agnostic to external tracking information
arXiv Detail & Related papers (2024-03-18T19:13:02Z)
Neural LerPlane Representations for Fast 4D Reconstruction of Deformable Tissues [52.886545681833596]
LerPlane is a novel method for fast and accurate reconstruction of surgical scenes under a single-viewpoint setting. LerPlane treats surgical procedures as 4D volumes and factorizes them into explicit 2D planes of static and dynamic fields. LerPlane shares static fields, significantly reducing the workload of dynamic tissue modeling.
arXiv Detail & Related papers (2023-05-31T14:38:35Z)
Robotic Navigation Autonomy for Subretinal Injection via Intelligent Real-Time Virtual iOCT Volume Slicing [88.99939660183881]
We propose a framework for autonomous robotic navigation for subretinal injection. Our method consists of an instrument pose estimation method, an online registration between the robotic and the i OCT system, and trajectory planning tailored for navigation to an injection target. Our experiments on ex-vivo porcine eyes demonstrate the precision and repeatability of the method.
arXiv Detail & Related papers (2023-01-17T21:41:21Z)
A Geometry-Informed Deep Learning Framework for Ultra-Sparse 3D Tomographic Image Reconstruction [13.44786774177579]
We establish a geometry-informed deep learning framework for ultra-sparse 3D tomographic image reconstruction. We demonstrate that the seamless inclusion of known priors is essential to enhance the performance of 3D volumetric computed tomography imaging.
arXiv Detail & Related papers (2021-05-25T06:20:03Z)
Surgical Visual Domain Adaptation: Results from the MICCAI 2020 SurgVisDom Challenge [9.986124942784969]
This work seeks to explore the potential for visual domain adaptation in surgery to overcome data privacy concerns. In particular, we propose to use video from virtual reality (VR) simulations of surgical exercises to develop algorithms to recognize tasks in a clinical-like setting. We present the performance of the different approaches to solve visual domain adaptation developed by challenge participants.
arXiv Detail & Related papers (2021-02-26T18:45:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.