MultiViPerFrOG: A Globally Optimized Multi-Viewpoint Perception Framework for Camera Motion and Tissue Deformation
- URL: http://arxiv.org/abs/2408.04367v1
- Date: Thu, 8 Aug 2024 10:55:55 GMT
- Title: MultiViPerFrOG: A Globally Optimized Multi-Viewpoint Perception Framework for Camera Motion and Tissue Deformation
- Authors: Guido Caccianiga, Julian Nubert, Cesar Cadena, Marco Hutter, Katherine J. Kuchenbecker,
- Abstract summary: We propose a framework that can flexibly integrate the output of low-level perception modules with kinematic and scene-modeling priors.
Overall, our method shows robustness to combined noisy input measures and can process hundreds of points in a few milliseconds.
- Score: 18.261678529996104
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reconstructing the 3D shape of a deformable environment from the information captured by a moving depth camera is highly relevant to surgery. The underlying challenge is the fact that simultaneously estimating camera motion and tissue deformation in a fully deformable scene is an ill-posed problem, especially from a single arbitrarily moving viewpoint. Current solutions are often organ-specific and lack the robustness required to handle large deformations. Here we propose a multi-viewpoint global optimization framework that can flexibly integrate the output of low-level perception modules (data association, depth, and relative scene flow) with kinematic and scene-modeling priors to jointly estimate multiple camera motions and absolute scene flow. We use simulated noisy data to show three practical examples that successfully constrain the convergence to a unique solution. Overall, our method shows robustness to combined noisy input measures and can process hundreds of points in a few milliseconds. MultiViPerFrOG builds a generalized learning-free scaffolding for spatio-temporal encoding that can unlock advanced surgical scene representations and will facilitate the development of the computer-assisted-surgery technologies of the future.
Related papers
- One Diffusion to Generate Them All [54.82732533013014]
OneDiffusion is a versatile, large-scale diffusion model that supports bidirectional image synthesis and understanding.
It enables conditional generation from inputs such as text, depth, pose, layout, and semantic maps.
OneDiffusion allows for multi-view generation, camera pose estimation, and instant personalization using sequential image inputs.
arXiv Detail & Related papers (2024-11-25T12:11:05Z) - MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion [118.74385965694694]
We present Motion DUSt3R (MonST3R), a novel geometry-first approach that directly estimates per-timestep geometry from dynamic scenes.
By simply estimating a pointmap for each timestep, we can effectively adapt DUST3R's representation, previously only used for static scenes, to dynamic scenes.
We show that by posing the problem as a fine-tuning task, identifying several suitable datasets, and strategically training the model on this limited data, we can surprisingly enable the model to handle dynamics.
arXiv Detail & Related papers (2024-10-04T18:00:07Z) - Learning Robust Multi-Scale Representation for Neural Radiance Fields
from Unposed Images [65.41966114373373]
We present an improved solution to the neural image-based rendering problem in computer vision.
The proposed approach could synthesize a realistic image of the scene from a novel viewpoint at test time.
arXiv Detail & Related papers (2023-11-08T08:18:23Z) - Consistent Depth of Moving Objects in Video [52.72092264848864]
We present a method to estimate depth of a dynamic scene, containing arbitrary moving objects, from an ordinary video captured with a moving camera.
We formulate this objective in a new test-time training framework where a depth-prediction CNN is trained in tandem with an auxiliary scene-flow prediction over the entire input video.
We demonstrate accurate and temporally coherent results on a variety of challenging videos containing diverse moving objects (pets, people, cars) as well as camera motion.
arXiv Detail & Related papers (2021-08-02T20:53:18Z) - DeepMultiCap: Performance Capture of Multiple Characters Using Sparse
Multiview Cameras [63.186486240525554]
DeepMultiCap is a novel method for multi-person performance capture using sparse multi-view cameras.
Our method can capture time varying surface details without the need of using pre-scanned template models.
arXiv Detail & Related papers (2021-05-01T14:32:13Z) - A Pose-only Solution to Visual Reconstruction and Navigation [23.86386627769292]
Large-scale scenes and critical camera motions are great challenges facing the research community to achieve this goal.
We raised a pose-only imaging geometry framework and algorithms that can help solve these challenges.
arXiv Detail & Related papers (2021-03-02T07:21:08Z) - Learning to Segment Rigid Motions from Two Frames [72.14906744113125]
We propose a modular network, motivated by a geometric analysis of what independent object motions can be recovered from an egomotion field.
It takes two consecutive frames as input and predicts segmentation masks for the background and multiple rigidly moving objects, which are then parameterized by 3D rigid transformations.
Our method achieves state-of-the-art performance for rigid motion segmentation on KITTI and Sintel.
arXiv Detail & Related papers (2021-01-11T04:20:30Z) - Event-based Stereo Visual Odometry [42.77238738150496]
We present a solution to the problem of visual odometry from the data acquired by a stereo event-based camera rig.
We seek to maximize thetemporal consistency of stereo event-based data while using a simple and efficient representation.
arXiv Detail & Related papers (2020-07-30T15:53:28Z) - DFVS: Deep Flow Guided Scene Agnostic Image Based Visual Servoing [11.000164408890635]
Existing deep learning based visual servoing approaches regress the relative camera pose between a pair of images.
We consider optical flow as our visual features, which are predicted using a deep neural network.
We show convergence for over 3m and 40 degrees while maintaining precise positioning of under 2cm and 1 degree.
arXiv Detail & Related papers (2020-03-08T11:42:36Z) - Multi-object Monocular SLAM for Dynamic Environments [12.537311048732017]
The term multibody, implies that we track the motion of the camera, as well as that of other dynamic participants in the scene.
Existing approaches solve restricted variants of the problem, but the solutions suffer relative scale ambiguity.
We propose a multi pose-graph optimization formulation, to resolve the relative and absolute scale factor ambiguities involved.
arXiv Detail & Related papers (2020-02-10T03:49:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.