Vid2CAD: CAD Model Alignment using Multi-View Constraints from Videos
- URL: http://arxiv.org/abs/2012.04641v1
- Date: Tue, 8 Dec 2020 18:57:45 GMT
- Title: Vid2CAD: CAD Model Alignment using Multi-View Constraints from Videos
- Authors: Kevis-Kokitsi Maninis, Stefan Popov, Matthias Nie{\ss}ner, Vittorio
Ferrari
- Abstract summary: We address the task of aligning CAD models to a video sequence of a complex scene containing multiple objects.
Our method is able to process arbitrary videos and fully automatically recover the 9 DoF pose for each object appearing in it, thus aligning them in a common 3D coordinate frame.
- Score: 48.69114433364771
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address the task of aligning CAD models to a video sequence of a complex
scene containing multiple objects. Our method is able to process arbitrary
videos and fully automatically recover the 9 DoF pose for each object appearing
in it, thus aligning them in a common 3D coordinate frame. The core idea of our
method is to integrate neural network predictions from individual frames with a
temporally global, multi-view constraint optimization formulation. This
integration process resolves the scale and depth ambiguities in the per-frame
predictions, and generally improves the estimate of all pose parameters. By
leveraging multi-view constraints, our method also resolves occlusions and
handles objects that are out of view in individual frames, thus reconstructing
all objects into a single globally consistent CAD representation of the scene.
In comparison to the state-of-the-art single-frame method Mask2CAD that we
build on, we achieve substantial improvements on Scan2CAD (from 11.6% to 30.2%
class average accuracy).
Related papers
- Zero-Shot Multi-Object Scene Completion [59.325611678171974]
We present a 3D scene completion method that recovers the complete geometry of multiple unseen objects in complex scenes from a single RGB-D image.
Our method outperforms the current state-of-the-art on both synthetic and real-world datasets.
arXiv Detail & Related papers (2024-03-21T17:59:59Z) - Sparse Multi-Object Render-and-Compare [33.97243145891282]
Reconstructing 3D shape and pose of static objects from a single image is an essential task for various industries.
Directly predicting 3D shapes produces unrealistic, overly smoothed or tessellated shapes.
Retrieving CAD models ensures realistic shapes but requires robust and accurate alignment.
arXiv Detail & Related papers (2023-10-17T12:01:32Z) - Tracking by 3D Model Estimation of Unknown Objects in Videos [122.56499878291916]
We argue that this representation is limited and instead propose to guide and improve 2D tracking with an explicit object representation.
Our representation tackles a complex long-term dense correspondence problem between all 3D points on the object for all video frames.
The proposed optimization minimizes a novel loss function to estimate the best 3D shape, texture, and 6DoF pose.
arXiv Detail & Related papers (2023-04-13T11:32:36Z) - Scene-Aware 3D Multi-Human Motion Capture from a Single Camera [83.06768487435818]
We consider the problem of estimating the 3D position of multiple humans in a scene as well as their body shape and articulation from a single RGB video recorded with a static camera.
We leverage recent advances in computer vision using large-scale pre-trained models for a variety of modalities, including 2D body joints, joint angles, normalized disparity maps, and human segmentation masks.
In particular, we estimate the scene depth and unique person scale from normalized disparity predictions using the 2D body joints and joint angles.
arXiv Detail & Related papers (2023-01-12T18:01:28Z) - WALDO: Future Video Synthesis using Object Layer Decomposition and
Parametric Flow Prediction [82.79642869586587]
WALDO is a novel approach to the prediction of future video frames from past ones.
Individual images are decomposed into multiple layers combining object masks and a small set of control points.
The layer structure is shared across all frames in each video to build dense inter-frame connections.
arXiv Detail & Related papers (2022-11-25T18:59:46Z) - SPARC: Sparse Render-and-Compare for CAD model alignment in a single RGB
image [21.77811443143683]
Estimating 3D shapes and poses of static objects from a single image has important applications for robotics, augmented reality and digital content creation.
We demonstrate that a sparse, iterative, render-and-compare approach is more accurate and robust than relying on normalised object coordinates.
Our alignment procedure converges after just 3 iterations, improving the state-of-the-art performance on the challenging real-world dataset ScanNet.
arXiv Detail & Related papers (2022-10-03T16:02:10Z) - RayTran: 3D pose estimation and shape reconstruction of multiple objects
from videos with ray-traced transformers [41.499325832227626]
We propose a transformer-based neural network architecture for multi-object 3D reconstruction from RGB videos.
We exploit knowledge about the image formation process to significantly sparsify the attention weight matrix.
Compared to previous methods, our architecture is single stage, end-to-end trainable.
arXiv Detail & Related papers (2022-03-24T18:49:12Z) - Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views.
We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z) - SceneCAD: Predicting Object Alignments and Layouts in RGB-D Scans [24.06640371472068]
We present a novel approach to reconstructing lightweight, CAD-based representations of scanned 3D environments from commodity RGB-D sensors.
Our key idea is to jointly optimize for both CAD model alignments as well as layout estimations of the scanned scene.
arXiv Detail & Related papers (2020-03-27T20:17:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.