Towards Suturing World Models: Learning Predictive Models for Robotic Surgical Tasks
- URL: http://arxiv.org/abs/2503.12531v1
- Date: Sun, 16 Mar 2025 14:51:12 GMT
- Title: Towards Suturing World Models: Learning Predictive Models for Robotic Surgical Tasks
- Authors: Mehmet Kerem Turkcan, Mattia Ballo, Filippo Filicori, Zoran Kostic,
- Abstract summary: We introduce diffusion-based temporal models that capture the dynamics of fine-grained robotic sub-stitch actions.<n>We fine-tune two state-of-the-art video diffusion models to generate high-fidelity surgical action sequences at $ge$Lox resolution and $ge$49 frames.<n>Our experimental results demonstrate that these world models can effectively capture the dynamics of suturing, potentially enabling improved training, skill assessment tools, and autonomous surgical systems.
- Score: 0.35087986342428684
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We introduce specialized diffusion-based generative models that capture the spatiotemporal dynamics of fine-grained robotic surgical sub-stitch actions through supervised learning on annotated laparoscopic surgery footage. The proposed models form a foundation for data-driven world models capable of simulating the biomechanical interactions and procedural dynamics of surgical suturing with high temporal fidelity. Annotating a dataset of $\sim2K$ clips extracted from simulation videos, we categorize surgical actions into fine-grained sub-stitch classes including ideal and non-ideal executions of needle positioning, targeting, driving, and withdrawal. We fine-tune two state-of-the-art video diffusion models, LTX-Video and HunyuanVideo, to generate high-fidelity surgical action sequences at $\ge$768x512 resolution and $\ge$49 frames. For training our models, we explore both Low-Rank Adaptation (LoRA) and full-model fine-tuning approaches. Our experimental results demonstrate that these world models can effectively capture the dynamics of suturing, potentially enabling improved training simulators, surgical skill assessment tools, and autonomous surgical systems. The models also display the capability to differentiate between ideal and non-ideal technique execution, providing a foundation for building surgical training and evaluation systems. We release our models for testing and as a foundation for future research. Project Page: https://mkturkcan.github.io/suturingmodels/
Related papers
- Surgical Vision World Model [7.227638707410672]
We propose the first surgical vision world model to generate action-controllable surgical data.<n>The proposed model can generate action-controllable surgical data and the architecture design is verified.
arXiv Detail & Related papers (2025-03-03T10:55:52Z) - Pre-Trained Video Generative Models as World Simulators [59.546627730477454]
We propose Dynamic World Simulation (DWS) to transform pre-trained video generative models into controllable world simulators.<n>To achieve precise alignment between conditioned actions and generated visual changes, we introduce a lightweight, universal action-conditioned module.<n> Experiments demonstrate that DWS can be versatilely applied to both diffusion and autoregressive transformer models.
arXiv Detail & Related papers (2025-02-10T14:49:09Z) - SurGen: Text-Guided Diffusion Model for Surgical Video Generation [0.6551407780976953]
SurGen is a text-guided diffusion model tailored for surgical video synthesis.
We validate the visual and temporal quality of the outputs using standard image and video generation metrics.
Our results demonstrate the potential of diffusion models to serve as valuable educational tools for surgical trainees.
arXiv Detail & Related papers (2024-08-26T05:38:27Z) - SimEndoGS: Efficient Data-driven Scene Simulation using Robotic Surgery Videos via Physics-embedded 3D Gaussians [19.590481146949685]
We introduce 3D Gaussian as a learnable representation for surgical scene, which is learned from stereo endoscopic video.
We apply the Material Point Method, which is integrated with physical properties, to the 3D Gaussians to achieve realistic scene deformations.
Results show that it can reconstruct and simulate surgical scenes from endoscopic videos efficiently-taking only a few minutes to reconstruct the surgical scene.
arXiv Detail & Related papers (2024-05-02T02:34:19Z) - Interactive Generation of Laparoscopic Videos with Diffusion Models [1.5488613349551188]
We show how to generate realistic laparoscopic images and videos by specifying a surgical action through text.
We demonstrate the performance of our approach using the publicly available Cholec dataset family.
We achieve an FID of 38.097 and an F1-score of 0.71.
arXiv Detail & Related papers (2024-04-23T12:36:07Z) - Creating a Digital Twin of Spinal Surgery: A Proof of Concept [68.37190859183663]
Surgery digitalization is the process of creating a virtual replica of real-world surgery.
We present a proof of concept (PoC) for surgery digitalization that is applied to an ex-vivo spinal surgery.
We employ five RGB-D cameras for dynamic 3D reconstruction of the surgeon, a high-end camera for 3D reconstruction of the anatomy, an infrared stereo camera for surgical instrument tracking, and a laser scanner for 3D reconstruction of the operating room and data fusion.
arXiv Detail & Related papers (2024-03-25T13:09:40Z) - Endora: Video Generation Models as Endoscopy Simulators [53.72175969751398]
This paper introduces model, an innovative approach to generate medical videos that simulate clinical endoscopy scenes.
We also pioneer the first public benchmark for endoscopy simulation with video generation models.
Endora marks a notable breakthrough in the deployment of generative AI for clinical endoscopy research.
arXiv Detail & Related papers (2024-03-17T00:51:59Z) - Neural LerPlane Representations for Fast 4D Reconstruction of Deformable
Tissues [52.886545681833596]
LerPlane is a novel method for fast and accurate reconstruction of surgical scenes under a single-viewpoint setting.
LerPlane treats surgical procedures as 4D volumes and factorizes them into explicit 2D planes of static and dynamic fields.
LerPlane shares static fields, significantly reducing the workload of dynamic tissue modeling.
arXiv Detail & Related papers (2023-05-31T14:38:35Z) - Multimodal Semantic Scene Graphs for Holistic Modeling of Surgical
Procedures [70.69948035469467]
We take advantage of the latest computer vision methodologies for generating 3D graphs from camera views.
We then introduce the Multimodal Semantic Graph Scene (MSSG) which aims at providing unified symbolic and semantic representation of surgical procedures.
arXiv Detail & Related papers (2021-06-09T14:35:44Z) - One to Many: Adaptive Instrument Segmentation via Meta Learning and
Dynamic Online Adaptation in Robotic Surgical Video [71.43912903508765]
MDAL is a dynamic online adaptive learning scheme for instrument segmentation in robot-assisted surgery.
It learns the general knowledge of instruments and the fast adaptation ability through the video-specific meta-learning paradigm.
It outperforms other state-of-the-art methods on two datasets.
arXiv Detail & Related papers (2021-03-24T05:02:18Z) - Recurrent and Spiking Modeling of Sparse Surgical Kinematics [0.8458020117487898]
A growing number of studies have used machine learning to analyze video and kinematic data captured from surgical robots.
In this study, we explore the possibility of using only kinematic data to predict surgeons of similar skill levels.
We report that it is possible to identify surgical fellows receiving near perfect scores in the simulation exercises based on their motion characteristics alone.
arXiv Detail & Related papers (2020-05-12T15:41:45Z) - Hybrid modeling: Applications in real-time diagnosis [64.5040763067757]
We outline a novel hybrid modeling approach that combines machine learning inspired models and physics-based models.
We are using such models for real-time diagnosis applications.
arXiv Detail & Related papers (2020-03-04T00:44:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.