SENDD: Sparse Efficient Neural Depth and Deformation for Tissue Tracking
- URL: http://arxiv.org/abs/2305.06477v2
- Date: Mon, 25 Sep 2023 19:36:15 GMT
- Title: SENDD: Sparse Efficient Neural Depth and Deformation for Tissue Tracking
- Authors: Adam Schmidt, Omid Mohareri, Simon DiMaio, Septimiu E. Salcudean
- Abstract summary: Real-time estimation of 3D tissue motion is essential for robotically assisted surgery.
Our model, Sparse Efficient Neural Depth and Deformation (SENDD), extends prior 2D tracking work to estimate flow in 3D space.
SENDD does this by using graph neural networks of sparse keypoint matches to estimate both depth and 3D flow anywhere.
- Score: 7.282909831316735
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deformable tracking and real-time estimation of 3D tissue motion is essential
to enable automation and image guidance applications in robotically assisted
surgery. Our model, Sparse Efficient Neural Depth and Deformation (SENDD),
extends prior 2D tracking work to estimate flow in 3D space. SENDD introduces
novel contributions of learned detection, and sparse per-point depth and 3D
flow estimation, all with less than half a million parameters. SENDD does this
by using graph neural networks of sparse keypoint matches to estimate both
depth and 3D flow anywhere. We quantify and benchmark SENDD on a
comprehensively labelled tissue dataset, and compare it to an equivalent 2D
flow model. SENDD performs comparably while enabling applications that 2D flow
cannot. SENDD can track points and estimate depth at 10fps on an NVIDIA RTX
4000 for 1280 tracked (query) points and its cost scales linearly with an
increasing/decreasing number of points. SENDD enables multiple downstream
applications that require estimation of 3D motion in stereo endoscopy.
Related papers
- TAPVid-3D: A Benchmark for Tracking Any Point in 3D [63.060421798990845]
We introduce a new benchmark, TAPVid-3D, for evaluating the task of Tracking Any Point in 3D.
This benchmark will serve as a guidepost to improve our ability to understand precise 3D motion and surface deformation from monocular video.
arXiv Detail & Related papers (2024-07-08T13:28:47Z) - NDC-Scene: Boost Monocular 3D Semantic Scene Completion in Normalized
Device Coordinates Space [77.6067460464962]
Monocular 3D Semantic Scene Completion (SSC) has garnered significant attention in recent years due to its potential to predict complex semantics and geometry shapes from a single image, requiring no 3D inputs.
We identify several critical issues in current state-of-the-art methods, including the Feature Ambiguity of projected 2D features in the ray to the 3D space, the Pose Ambiguity of the 3D convolution, and the Imbalance in the 3D convolution across different depth levels.
We devise a novel Normalized Device Coordinates scene completion network (NDC-Scene) that directly extends the 2
arXiv Detail & Related papers (2023-09-26T02:09:52Z) - 3D-Aware Neural Body Fitting for Occlusion Robust 3D Human Pose
Estimation [28.24765523800196]
We propose 3D-aware Neural Body Fitting (3DNBF) for 3D human pose estimation.
In particular, we propose a generative model of deep features based on a volumetric human representation with Gaussian ellipsoidal kernels emitting 3D pose-dependent feature vectors.
The neural features are trained with contrastive learning to become 3D-aware and hence to overcome the 2D-3D ambiguity.
arXiv Detail & Related papers (2023-08-19T22:41:00Z) - Act3D: 3D Feature Field Transformers for Multi-Task Robotic Manipulation [18.964403296437027]
Act3D represents the robot's workspace using a 3D feature field with adaptive resolutions dependent on the task at hand.
It samples 3D point grids in a coarse to fine manner, featurizes them using relative-position attention, and selects where to focus the next round of point sampling.
arXiv Detail & Related papers (2023-06-30T17:34:06Z) - Homography Loss for Monocular 3D Object Detection [54.04870007473932]
A differentiable loss function, termed as Homography Loss, is proposed to achieve the goal, which exploits both 2D and 3D information.
Our method yields the best performance compared with the other state-of-the-arts by a large margin on KITTI 3D datasets.
arXiv Detail & Related papers (2022-04-02T03:48:03Z) - Soft Expectation and Deep Maximization for Image Feature Detection [68.8204255655161]
We propose SEDM, an iterative semi-supervised learning process that flips the question and first looks for repeatable 3D points, then trains a detector to localize them in image space.
Our results show that this new model trained using SEDM is able to better localize the underlying 3D points in a scene.
arXiv Detail & Related papers (2021-04-21T00:35:32Z) - PLUME: Efficient 3D Object Detection from Stereo Images [95.31278688164646]
Existing methods tackle the problem in two steps: first depth estimation is performed, a pseudo LiDAR point cloud representation is computed from the depth estimates, and then object detection is performed in 3D space.
We propose a model that unifies these two tasks in the same metric space.
Our approach achieves state-of-the-art performance on the challenging KITTI benchmark, with significantly reduced inference time compared with existing methods.
arXiv Detail & Related papers (2021-01-17T05:11:38Z) - Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion
Forecasting with a Single Convolutional Net [93.51773847125014]
We propose a novel deep neural network that is able to jointly reason about 3D detection, tracking and motion forecasting given data captured by a 3D sensor.
Our approach performs 3D convolutions across space and time over a bird's eye view representation of the 3D world.
arXiv Detail & Related papers (2020-12-22T22:43:35Z) - DeepTracking-Net: 3D Tracking with Unsupervised Learning of Continuous
Flow [12.690471276907445]
This paper deals with the problem of 3D tracking, i.e., to find dense correspondences in a sequence of time-varying 3D shapes.
We propose a novel unsupervised 3D shape framework named DeepTracking-Net, which uses deep neural networks (DNNs) as auxiliary functions.
In addition, we prepare a new synthetic 3D data, named SynMotions, to the 3D tracking and recognition community.
arXiv Detail & Related papers (2020-06-24T16:20:48Z) - Pointwise Attention-Based Atrous Convolutional Neural Networks [15.499267533387039]
A pointwise attention-based atrous convolutional neural network architecture is proposed to efficiently deal with a large number of points.
The proposed model has been evaluated on the two most important 3D point cloud datasets for the 3D semantic segmentation task.
It achieves a reasonable performance compared to state-of-the-art models in terms of accuracy, with a much smaller number of parameters.
arXiv Detail & Related papers (2019-12-27T13:12:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.