Online Adaptation for Implicit Object Tracking and Shape Reconstruction
in the Wild
- URL: http://arxiv.org/abs/2111.12728v1
- Date: Wed, 24 Nov 2021 19:00:05 GMT
- Title: Online Adaptation for Implicit Object Tracking and Shape Reconstruction
in the Wild
- Authors: Jianglong Ye, Yuntao Chen, Naiyan Wang, Xiaolong Wang
- Abstract summary: We introduce a novel and unified framework which utilizes a DeepSDF model to simultaneously track and reconstruct 3D objects in the wild.
We show significant improvements over state-of-the-art methods for both tracking and shape reconstruction.
- Score: 22.19769576901151
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Tracking and reconstructing 3D objects from cluttered scenes are the key
components for computer vision, robotics and autonomous driving systems. While
recent progress in implicit function (e.g., DeepSDF) has shown encouraging
results on high-quality 3D shape reconstruction, it is still very challenging
to generalize to cluttered and partially observable LiDAR data. In this paper,
we propose to leverage the continuity in video data. We introduce a novel and
unified framework which utilizes a DeepSDF model to simultaneously track and
reconstruct 3D objects in the wild. We online adapt the DeepSDF model in the
video, iteratively improving the shape reconstruction while in return improving
the tracking, and vice versa. We experiment with both Waymo and KITTI datasets,
and show significant improvements over state-of-the-art methods for both
tracking and shape reconstruction.
Related papers
- LASA: Instance Reconstruction from Real Scans using A Large-scale
Aligned Shape Annotation Dataset [17.530432165466507]
We present a novel Cross-Modal Shape Reconstruction (DisCo) method and an Occupancy-Guided 3D Object Detection (OccGOD) method.
Our methods achieve state-of-the-art performance in both instance-level scene reconstruction and 3D object detection tasks.
arXiv Detail & Related papers (2023-12-19T18:50:10Z) - AutoDecoding Latent 3D Diffusion Models [95.7279510847827]
We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core.
The 3D autodecoder framework embeds properties learned from the target dataset in the latent space.
We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations.
arXiv Detail & Related papers (2023-07-07T17:59:14Z) - AutoRecon: Automated 3D Object Discovery and Reconstruction [41.60050228813979]
We propose a novel framework named AutoRecon for the automated discovery and reconstruction of an object from multi-view images.
We demonstrate that foreground objects can be robustly located and segmented from SfM point clouds by leveraging self-supervised 2D vision transformer features.
Experiments on the DTU, BlendedMVS and CO3D-V2 datasets demonstrate the effectiveness and robustness of AutoRecon.
arXiv Detail & Related papers (2023-05-15T17:16:46Z) - gSDF: Geometry-Driven Signed Distance Functions for 3D Hand-Object
Reconstruction [94.46581592405066]
We exploit the hand structure and use it as guidance for SDF-based shape reconstruction.
We predict kinematic chains of pose transformations and align SDFs with highly-articulated hand poses.
arXiv Detail & Related papers (2023-04-24T10:05:48Z) - Single-view 3D Mesh Reconstruction for Seen and Unseen Categories [69.29406107513621]
Single-view 3D Mesh Reconstruction is a fundamental computer vision task that aims at recovering 3D shapes from single-view RGB images.
This paper tackles Single-view 3D Mesh Reconstruction, to study the model generalization on unseen categories.
We propose an end-to-end two-stage network, GenMesh, to break the category boundaries in reconstruction.
arXiv Detail & Related papers (2022-08-04T14:13:35Z) - Class-agnostic Reconstruction of Dynamic Objects from Videos [127.41336060616214]
We introduce REDO, a class-agnostic framework to REconstruct the Dynamic Objects from RGBD or calibrated videos.
We develop two novel modules. First, we introduce a canonical 4D implicit function which is pixel-aligned with aggregated temporal visual cues.
Second, we develop a 4D transformation module which captures object dynamics to support temporal propagation and aggregation.
arXiv Detail & Related papers (2021-12-03T18:57:47Z) - SAIL-VOS 3D: A Synthetic Dataset and Baselines for Object Detection and
3D Mesh Reconstruction from Video Data [124.2624568006391]
We present SAIL-VOS 3D: a synthetic video dataset with frame-by-frame mesh annotations.
We also develop first baselines for reconstruction of 3D meshes from video data via temporal models.
arXiv Detail & Related papers (2021-05-18T15:42:37Z) - Unsupervised Learning of 3D Object Categories from Videos in the Wild [75.09720013151247]
We focus on learning a model from multiple views of a large collection of object instances.
We propose a new neural network design, called warp-conditioned ray embedding (WCR), which significantly improves reconstruction.
Our evaluation demonstrates performance improvements over several deep monocular reconstruction baselines on existing benchmarks.
arXiv Detail & Related papers (2021-03-30T17:57:01Z) - Learning monocular 3D reconstruction of articulated categories from
motion [39.811816510186475]
Video self-supervision forces the consistency of consecutive 3D reconstructions by a motion-based cycle loss.
We introduce an interpretable model of 3D template deformations that controls a 3D surface through the displacement of a small number of local, learnable handles.
We obtain state-of-the-art reconstructions with diverse shapes, viewpoints and textures for multiple articulated object categories.
arXiv Detail & Related papers (2021-03-30T13:50:27Z) - SDF-SRN: Learning Signed Distance 3D Object Reconstruction from Static
Images [44.78174845839193]
Recent efforts have turned to learning 3D reconstruction without 3D supervision from RGB images with annotated 2D silhouettes.
These techniques still require multi-view annotations of the same object instance during training.
We propose SDF-SRN, an approach that requires only a single view of objects at training time.
arXiv Detail & Related papers (2020-10-20T17:59:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.