4DContrast: Contrastive Learning with Dynamic Correspondences for 3D
Scene Understanding
- URL: http://arxiv.org/abs/2112.02990v1
- Date: Mon, 6 Dec 2021 13:09:07 GMT
- Title: 4DContrast: Contrastive Learning with Dynamic Correspondences for 3D
Scene Understanding
- Authors: Yujin Chen, Matthias Nie{\ss}ner, Angela Dai
- Abstract summary: We present a new approach to instill 4D dynamic object priors into learned 3D representations by unsupervised pre-training.
We propose a new data augmentation scheme leveraging synthetic 3D shapes moving in static 3D environments.
Experiments demonstrate that our unsupervised representation learning results in improvement in downstream 3D semantic segmentation, object detection, and instance segmentation tasks.
- Score: 22.896937940702642
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a new approach to instill 4D dynamic object priors into learned 3D
representations by unsupervised pre-training. We observe that dynamic movement
of an object through an environment provides important cues about its
objectness, and thus propose to imbue learned 3D representations with such
dynamic understanding, that can then be effectively transferred to improved
performance in downstream 3D semantic scene understanding tasks. We propose a
new data augmentation scheme leveraging synthetic 3D shapes moving in static 3D
environments, and employ contrastive learning under 3D-4D constraints that
encode 4D invariances into the learned 3D representations. Experiments
demonstrate that our unsupervised representation learning results in
improvement in downstream 3D semantic segmentation, object detection, and
instance segmentation tasks, and moreover, notably improves performance in
data-scarce scenarios.
Related papers
- Learning 3D Representations from Procedural 3D Programs [6.915871213703219]
Self-supervised learning has emerged as a promising approach for acquiring transferable 3D representations from unlabeled 3D point clouds.
We propose learning 3D representations from procedural 3D programs that automatically generate 3D shapes using simple primitives and augmentations.
arXiv Detail & Related papers (2024-11-25T18:59:57Z) - Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - SUGAR: Pre-training 3D Visual Representations for Robotics [85.55534363501131]
We introduce a novel 3D pre-training framework for robotics named SUGAR.
SUGAR captures semantic, geometric and affordance properties of objects through 3D point clouds.
We show that SUGAR's 3D representation outperforms state-of-the-art 2D and 3D representations.
arXiv Detail & Related papers (2024-04-01T21:23:03Z) - DO3D: Self-supervised Learning of Decomposed Object-aware 3D Motion and
Depth from Monocular Videos [76.01906393673897]
We propose a self-supervised method to jointly learn 3D motion and depth from monocular videos.
Our system contains a depth estimation module to predict depth, and a new decomposed object-wise 3D motion (DO3D) estimation module to predict ego-motion and 3D object motion.
Our model delivers superior performance in all evaluated settings.
arXiv Detail & Related papers (2024-03-09T12:22:46Z) - H4D: Human 4D Modeling by Learning Neural Compositional Representation [75.34798886466311]
This work presents a novel framework that can effectively learn a compact and compositional representation for dynamic human.
A simple yet effective linear motion model is proposed to provide a rough and regularized motion estimation.
Experiments demonstrate our method is not only efficacy in recovering dynamic human with accurate motion and detailed geometry, but also amenable to various 4D human related tasks.
arXiv Detail & Related papers (2022-03-02T17:10:49Z) - Spatio-temporal Self-Supervised Representation Learning for 3D Point
Clouds [96.9027094562957]
We introduce a-temporal representation learning framework, capable of learning from unlabeled tasks.
Inspired by how infants learn from visual data in the wild, we explore rich cues derived from the 3D data.
STRL takes two temporally-related frames from a 3D point cloud sequence as the input, transforms it with the spatial data augmentation, and learns the invariant representation self-supervisedly.
arXiv Detail & Related papers (2021-09-01T04:17:11Z) - RandomRooms: Unsupervised Pre-training from Synthetic Shapes and
Randomized Layouts for 3D Object Detection [138.2892824662943]
A promising solution is to make better use of the synthetic dataset, which consists of CAD object models, to boost the learning on real datasets.
Recent work on 3D pre-training exhibits failure when transfer features learned on synthetic objects to other real-world applications.
In this work, we put forward a new method called RandomRooms to accomplish this objective.
arXiv Detail & Related papers (2021-08-17T17:56:12Z) - Info3D: Representation Learning on 3D Objects using Mutual Information
Maximization and Contrastive Learning [8.448611728105513]
We propose to extend the InfoMax and contrastive learning principles on 3D shapes.
We show that we can maximize the mutual information between 3D objects and their "chunks" to improve the representations in aligned datasets.
arXiv Detail & Related papers (2020-06-04T00:30:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.