Related papers: Monocular pose estimation of articulated surgical instruments in open surgery

Monocular pose estimation of articulated surgical instruments in open surgery

URL: http://arxiv.org/abs/2407.12138v1
Date: Tue, 16 Jul 2024 19:47:35 GMT
Title: Monocular pose estimation of articulated surgical instruments in open surgery
Authors: Robert Spektor, Tom Friedman, Itay Or, Gil Bolotin, Shlomi Laufer,
Abstract summary: This work presents a novel approach to monocular 6D pose estimation of surgical instruments in open surgery, addressing challenges such as object articulations, symmetries, and lack of annotated real-world data. The proposed approach consists of three main components: (1) synthetic data generation using 3D modeling of surgical tools with articulation rigging; (2) a tailored pose estimation framework combining object detection with pose estimation and a hybrid geometric fusion strategy; and (3) a training strategy that utilizes both synthetic and real unannotated data, employing domain adaptation on real video data using automatically generated pseudo-labels.
Score: 0.873811641236639
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This work presents a novel approach to monocular 6D pose estimation of surgical instruments in open surgery, addressing challenges such as object articulations, symmetries, occlusions, and lack of annotated real-world data. The method leverages synthetic data generation and domain adaptation techniques to overcome these obstacles. The proposed approach consists of three main components: (1) synthetic data generation using 3D modeling of surgical tools with articulation rigging and physically-based rendering; (2) a tailored pose estimation framework combining object detection with pose estimation and a hybrid geometric fusion strategy; and (3) a training strategy that utilizes both synthetic and real unannotated data, employing domain adaptation on real video data using automatically generated pseudo-labels. Evaluations conducted on videos of open surgery demonstrate the good performance and real-world applicability of the proposed method, highlighting its potential for integration into medical augmented reality and robotic systems. The approach eliminates the need for extensive manual annotation of real surgical data.

Related papers

Landmark-Free Preoperative-to-Intraoperative Registration in Laparoscopic Liver Resection [50.388465935739376]
Liver registration by overlaying preoperative 3D models onto intraoperative 2D frames can assist surgeons in perceiving the spatial anatomy of the liver clearly for a higher surgical success rate. Existing registration methods rely heavily on anatomical landmark-based, which encounter two major limitations. We propose a landmark-free preoperative-to-intraoperative registration framework utilizing effective self-supervised learning.
arXiv Detail & Related papers (2025-04-21T14:55:57Z)
A Review of 3D Reconstruction Techniques for Deformable Tissues in Robotic Surgery [8.909938295090827]
NeRF-based techniques have recently garnered attention for the ability to reconstruct scenes implicitly. On the other hand, 3D-GS represents scenes explicitly using 3D Gaussians and projects them onto a 2D plane as a replacement for the complex volume rendering in NeRF. This work explores and reviews state-of-the-art (SOTA) approaches, discussing their innovations and implementation principles.
arXiv Detail & Related papers (2024-08-08T12:51:23Z)
Enhanced Knee Kinematics: Leveraging Deep Learning and Morphing Algorithms for 3D Implant Modeling [2.752817022620644]
This study proposes a novel approach using machine learning algorithms and morphing techniques for precise 3D reconstruction of implanted knee models. A convolutional neural network is trained to automatically segment the femur contour of the implanted components. A morphing algorithm generates a personalized 3D model of the implanted knee joint.
arXiv Detail & Related papers (2024-08-02T20:11:04Z)
Realistic Surgical Image Dataset Generation Based On 3D Gaussian Splatting [3.5351922399745166]
This research introduces a novel method that employs 3D Gaussian Splatting to generate synthetic surgical datasets. We developed a data recording system capable of acquiring images alongside tool and camera poses in a surgical scene. Using this pose data, we synthetically replicate the scene, thereby enabling direct comparisons of the synthetic image quality.
arXiv Detail & Related papers (2024-07-20T11:20:07Z)
Surgical Triplet Recognition via Diffusion Model [59.50938852117371]
Surgical triplet recognition is an essential building block to enable next-generation context-aware operating rooms. We propose Difft, a new generative framework for surgical triplet recognition employing the diffusion model. Experiments on the CholecT45 and CholecT50 datasets show the superiority of the proposed method in achieving a new state-of-the-art performance for surgical triplet recognition.
arXiv Detail & Related papers (2024-06-19T04:43:41Z)
Creating a Digital Twin of Spinal Surgery: A Proof of Concept [68.37190859183663]
Surgery digitalization is the process of creating a virtual replica of real-world surgery. We present a proof of concept (PoC) for surgery digitalization that is applied to an ex-vivo spinal surgery. We employ five RGB-D cameras for dynamic 3D reconstruction of the surgeon, a high-end camera for 3D reconstruction of the anatomy, an infrared stereo camera for surgical instrument tracking, and a laser scanner for 3D reconstruction of the operating room and data fusion.
arXiv Detail & Related papers (2024-03-25T13:09:40Z)
Zero123-6D: Zero-shot Novel View Synthesis for RGB Category-level 6D Pose Estimation [66.3814684757376]
This work presents Zero123-6D, the first work to demonstrate the utility of Diffusion Model-based novel-view-synthesizers in enhancing RGB 6D pose estimation at category-level. The outlined method shows reduction in data requirements, removal of the necessity of depth information in zero-shot category-level 6D pose estimation task, and increased performance, quantitatively demonstrated through experiments on the CO3D dataset.
arXiv Detail & Related papers (2024-03-21T10:38:18Z)
Domain adaptation strategies for 3D reconstruction of the lumbar spine using real fluoroscopy data [9.21828361691977]
This study tackles key obstacles in adopting surgical navigation in orthopedic surgeries. It shows an approach for generating 3D anatomical models of the spine from only a few fluoroscopic images. It achieved an 84% F1 score, matching the accuracy of our previous synthetic data-based research.
arXiv Detail & Related papers (2024-01-29T10:22:45Z)
CholecTriplet2022: Show me a tool and tell me the triplet -- an endoscopic vision challenge for surgical action triplet detection [41.66666272822756]
This paper presents the CholecTriplet2022 challenge, which extends surgical action triplet modeling from recognition to detection. It includes weakly-supervised bounding box localization of every visible surgical instrument (or tool) as the key actors, and the modeling of each tool-activity in the form of instrument, verb, target> triplet.
arXiv Detail & Related papers (2023-02-13T11:53:14Z)
CholecTriplet2021: A benchmark challenge for surgical action triplet recognition [66.51610049869393]
This paper presents CholecTriplet 2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos. We present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge. A total of 4 baseline methods and 19 new deep learning algorithms are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%.
arXiv Detail & Related papers (2022-04-10T18:51:55Z)
Towards unconstrained joint hand-object reconstruction from RGB videos [81.97694449736414]
Reconstructing hand-object manipulations holds a great potential for robotics and learning from human demonstrations. We first propose a learning-free fitting approach for hand-object reconstruction which can seamlessly handle two-hand object interactions.
arXiv Detail & Related papers (2021-08-16T12:26:34Z)
Multimodal Semantic Scene Graphs for Holistic Modeling of Surgical Procedures [70.69948035469467]
We take advantage of the latest computer vision methodologies for generating 3D graphs from camera views. We then introduce the Multimodal Semantic Graph Scene (MSSG) which aims at providing unified symbolic and semantic representation of surgical procedures.
arXiv Detail & Related papers (2021-06-09T14:35:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.