Self-Supervised Depth Estimation in Laparoscopic Image using 3D
Geometric Consistency
- URL: http://arxiv.org/abs/2208.08407v2
- Date: Tue, 20 Jun 2023 22:33:48 GMT
- Title: Self-Supervised Depth Estimation in Laparoscopic Image using 3D
Geometric Consistency
- Authors: Baoru Huang, Jian-Qing Zheng, Anh Nguyen, Chi Xu, Ioannis Gkouzionis,
Kunal Vyas, David Tuch, Stamatia Giannarou, Daniel S. Elson
- Abstract summary: We present M3Depth, a self-supervised depth estimator to leverage 3D geometric structural information hidden in stereo pairs.
Our method outperforms previous self-supervised approaches on both a public dataset and a newly acquired dataset by a large margin.
- Score: 7.902636435901286
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Depth estimation is a crucial step for image-guided intervention in robotic
surgery and laparoscopic imaging system. Since per-pixel depth ground truth is
difficult to acquire for laparoscopic image data, it is rarely possible to
apply supervised depth estimation to surgical applications. As an alternative,
self-supervised methods have been introduced to train depth estimators using
only synchronized stereo image pairs. However, most recent work focused on the
left-right consistency in 2D and ignored valuable inherent 3D information on
the object in real world coordinates, meaning that the left-right 3D geometric
structural consistency is not fully utilized. To overcome this limitation, we
present M3Depth, a self-supervised depth estimator to leverage 3D geometric
structural information hidden in stereo pairs while keeping monocular
inference. The method also removes the influence of border regions unseen in at
least one of the stereo images via masking, to enhance the correspondences
between left and right images in overlapping areas. Intensive experiments show
that our method outperforms previous self-supervised approaches on both a
public dataset and a newly acquired dataset by a large margin, indicating a
good generalization across different samples and laparoscopes. Code and data
are available at https://github.com/br0202/M3Depth.
Related papers
- GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision [49.839374549646884]
This paper presents GEOcc, a Geometric-Enhanced Occupancy network tailored for vision-only surround-view perception.
Our approach achieves State-Of-The-Art performance on the Occ3D-nuScenes dataset with the least image resolution needed and the most weightless image backbone.
arXiv Detail & Related papers (2024-05-17T07:31:20Z) - Intraoperative 2D/3D Image Registration via Differentiable X-ray Rendering [5.617649111108429]
We present DiffPose, a self-supervised approach that leverages patient-specific simulation and differentiable physics-based rendering to achieve accurate 2D/3D registration without relying on manually labeled data.
DiffPose achieves sub-millimeter accuracy across surgical datasets at intraoperative speeds, improving upon existing unsupervised methods by an order of magnitude and even outperforming supervised baselines.
arXiv Detail & Related papers (2023-12-11T13:05:54Z) - LFM-3D: Learnable Feature Matching Across Wide Baselines Using 3D
Signals [9.201550006194994]
Learnable matchers often underperform when there exists only small regions of co-visibility between image pairs.
We propose LFM-3D, a Learnable Feature Matching framework that uses models based on graph neural networks.
We show that the resulting improved correspondences lead to much higher relative posing accuracy for in-the-wild image pairs.
arXiv Detail & Related papers (2023-03-22T17:46:27Z) - RiCS: A 2D Self-Occlusion Map for Harmonizing Volumetric Objects [68.85305626324694]
Ray-marching in Camera Space (RiCS) is a new method to represent the self-occlusions of foreground objects in 3D into a 2D self-occlusion map.
We show that our representation map not only allows us to enhance the image quality but also to model temporally coherent complex shadow effects.
arXiv Detail & Related papers (2022-05-14T05:35:35Z) - Homography Loss for Monocular 3D Object Detection [54.04870007473932]
A differentiable loss function, termed as Homography Loss, is proposed to achieve the goal, which exploits both 2D and 3D information.
Our method yields the best performance compared with the other state-of-the-arts by a large margin on KITTI 3D datasets.
arXiv Detail & Related papers (2022-04-02T03:48:03Z) - Neural Radiance Fields Approach to Deep Multi-View Photometric Stereo [103.08512487830669]
We present a modern solution to the multi-view photometric stereo problem (MVPS)
We procure the surface orientation using a photometric stereo (PS) image formation model and blend it with a multi-view neural radiance field representation to recover the object's surface geometry.
Our method performs neural rendering of multi-view images while utilizing surface normals estimated by a deep photometric stereo network.
arXiv Detail & Related papers (2021-10-11T20:20:03Z) - Learning Stereopsis from Geometric Synthesis for 6D Object Pose
Estimation [11.999630902627864]
Current monocular-based 6D object pose estimation methods generally achieve less competitive results than RGBD-based methods.
This paper proposes a 3D geometric volume based pose estimation method with a short baseline two-view setting.
Experiments show that our method outperforms state-of-the-art monocular-based methods, and is robust in different objects and scenes.
arXiv Detail & Related papers (2021-09-25T02:55:05Z) - Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection [70.71934539556916]
We learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection.
Specifically, a principled geometry formula with projective modeling of 2D and 3D depth predictions in the monocular 3D object detection network is devised.
Our method remarkably improves the detection performance of the state-of-the-art monocular-based method without extra data by 2.80% on the moderate test setting.
arXiv Detail & Related papers (2021-07-29T12:30:39Z) - Monocular Differentiable Rendering for Self-Supervised 3D Object
Detection [21.825158925459732]
3D object detection from monocular images is an ill-posed problem due to the projective entanglement of depth and scale.
We present a novel self-supervised method for textured 3D shape reconstruction and pose estimation of rigid objects.
Our method predicts the 3D location and meshes of each object in an image using differentiable rendering and a self-supervised objective.
arXiv Detail & Related papers (2020-09-30T09:21:43Z) - Center3D: Center-based Monocular 3D Object Detection with Joint Depth
Understanding [3.4806267677524896]
Center3D is a one-stage anchor-free approach to efficiently estimate 3D location and depth.
By exploiting the difference between 2D and 3D centers, we are able to estimate depth consistently.
Compared with state-of-the-art detectors, Center3D has achieved the best speed-accuracy trade-off in realtime monocular object detection.
arXiv Detail & Related papers (2020-05-27T15:29:09Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.