A Robotic 3D Perception System for Operating Room Environment Awareness
- URL: http://arxiv.org/abs/2003.09487v2
- Date: Mon, 30 Mar 2020 17:20:29 GMT
- Title: A Robotic 3D Perception System for Operating Room Environment Awareness
- Authors: Zhaoshuo Li, Amirreza Shaban, Jean-Gabriel Simard, Dinesh Rabindran,
Simon DiMaio, Omid Mohareri
- Abstract summary: We describe a 3D multi-view perception system for the da Vinci surgical system to enable Operating room (OR) scene understanding and context awareness.
Based on this architecture, a multi-view 3D scene semantic segmentation algorithm is created.
Our proposed architecture has acceptable registration error ($3.3%pm1.4%$ of object-camera distance) and can robustly improve scene segmentation performance.
- Score: 3.830091185868436
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Purpose: We describe a 3D multi-view perception system for the da Vinci
surgical system to enable Operating room (OR) scene understanding and context
awareness.
Methods: Our proposed system is comprised of four Time-of-Flight (ToF)
cameras rigidly attached to strategic locations on the daVinci Xi patient side
cart (PSC). The cameras are registered to the robot's kinematic chain by
performing a one-time calibration routine and therefore, information from all
cameras can be fused and represented in one common coordinate frame. Based on
this architecture, a multi-view 3D scene semantic segmentation algorithm is
created to enable recognition of common and salient objects/equipment and
surgical activities in a da Vinci OR. Our proposed 3D semantic segmentation
method has been trained and validated on a novel densely annotated dataset that
has been captured from clinical scenarios.
Results: The results show that our proposed architecture has acceptable
registration error ($3.3\%\pm1.4\%$ of object-camera distance) and can robustly
improve scene segmentation performance (mean Intersection Over Union - mIOU)
for less frequently appearing classes ($\ge 0.013$) compared to a single-view
method.
Conclusion: We present the first dynamic multi-view perception system with a
novel segmentation architecture, which can be used as a building block
technology for applications such as surgical workflow analysis, automation of
surgical sub-tasks and advanced guidance systems.
Related papers
- Kinematics-based 3D Human-Object Interaction Reconstruction from Single View [10.684643503514849]
Existing methods simply predict the body poses merely rely on network training on some indoor datasets.
We propose a kinematics-based method that can drive the joints of human body to the human-object contact regions accurately.
arXiv Detail & Related papers (2024-07-19T05:44:35Z) - Self-supervised Learning via Cluster Distance Prediction for Operating Room Context Awareness [44.15562068190958]
In the Operating Room, semantic segmentation is at the core of creating robots aware of clinical surroundings.
State-of-the-art semantic segmentation and activity recognition approaches are fully supervised, which is not scalable.
We propose a new 3D self-supervised task for OR scene understanding utilizing OR scene images captured with ToF cameras.
arXiv Detail & Related papers (2024-07-07T17:17:52Z) - PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic
Segmentation [45.39981876226129]
We study camera-based 3D panoptic segmentation, aiming to achieve a unified occupancy representation for camera-only 3D scene understanding.
We introduce a novel method called PanoOcc, which utilizes voxel queries to aggregate semantic information from multi-frame and multi-view images.
Our approach achieves new state-of-the-art results for camera-based segmentation and panoptic segmentation on the nuScenes dataset.
arXiv Detail & Related papers (2023-06-16T17:59:33Z) - Scene as Occupancy [66.43673774733307]
OccNet is a vision-centric pipeline with a cascade and temporal voxel decoder to reconstruct 3D occupancy.
We propose OpenOcc, the first dense high-quality 3D occupancy benchmark built on top of nuScenes.
arXiv Detail & Related papers (2023-06-05T13:01:38Z) - Next-generation Surgical Navigation: Marker-less Multi-view 6DoF Pose
Estimation of Surgical Instruments [66.74633676595889]
We present a multi-camera capture setup consisting of static and head-mounted cameras.
Second, we publish a multi-view RGB-D video dataset of ex-vivo spine surgeries, captured in a surgical wet lab and a real operating theatre.
Third, we evaluate three state-of-the-art single-view and multi-view methods for the task of 6DoF pose estimation of surgical instruments.
arXiv Detail & Related papers (2023-05-05T13:42:19Z) - Neural Implicit Dense Semantic SLAM [83.04331351572277]
We propose a novel RGBD vSLAM algorithm that learns a memory-efficient, dense 3D geometry, and semantic segmentation of an indoor scene in an online manner.
Our pipeline combines classical 3D vision-based tracking and loop closing with neural fields-based mapping.
Our proposed algorithm can greatly enhance scene perception and assist with a range of robot control problems.
arXiv Detail & Related papers (2023-04-27T23:03:52Z) - Semantic-SuPer: A Semantic-aware Surgical Perception Framework for
Endoscopic Tissue Classification, Reconstruction, and Tracking [21.133420628173067]
We present a novel surgical perception framework, Semantic-SuPer.
It integrates geometric and semantic information to facilitate data association, 3D reconstruction, and tracking of endoscopic scenes.
arXiv Detail & Related papers (2022-10-29T19:33:21Z) - Neural Groundplans: Persistent Neural Scene Representations from a
Single Image [90.04272671464238]
We present a method to map 2D image observations of a scene to a persistent 3D scene representation.
We propose conditional neural groundplans as persistent and memory-efficient scene representations.
arXiv Detail & Related papers (2022-07-22T17:41:24Z) - Towards Panoptic 3D Parsing for Single Image in the Wild [35.98539308998578]
This paper presents an integrated system that performs holistic image segmentation, object detection, instance segmentation, depth estimation, and object instance 3D reconstruction for indoor and outdoor scenes from a single RGB image.
Our proposed panoptic 3D parsing framework points to a promising direction in computer vision.
It can be applied to various applications, including autonomous driving, mapping, robotics, design, computer graphics, robotics, human-computer interaction, and augmented reality.
arXiv Detail & Related papers (2021-11-04T17:45:04Z) - Multimodal Semantic Scene Graphs for Holistic Modeling of Surgical
Procedures [70.69948035469467]
We take advantage of the latest computer vision methodologies for generating 3D graphs from camera views.
We then introduce the Multimodal Semantic Graph Scene (MSSG) which aims at providing unified symbolic and semantic representation of surgical procedures.
arXiv Detail & Related papers (2021-06-09T14:35:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.