3D-LFM: Lifting Foundation Model
- URL: http://arxiv.org/abs/2312.11894v2
- Date: Fri, 26 Apr 2024 05:54:28 GMT
- Title: 3D-LFM: Lifting Foundation Model
- Authors: Mosam Dabhi, Laszlo A. Jeni, Simon Lucey,
- Abstract summary: deep learning has expanded our capability to reconstruct a wide range of object classes.
Our approach harnesses the inherent permutation equivariance transformers to manage varying number points per 3D data instance.
We demonstrate state the art performance across 2D-3D lifting task benchmarks.
- Score: 29.48835001900286
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The lifting of 3D structure and camera from 2D landmarks is at the cornerstone of the entire discipline of computer vision. Traditional methods have been confined to specific rigid objects, such as those in Perspective-n-Point (PnP) problems, but deep learning has expanded our capability to reconstruct a wide range of object classes (e.g. C3DPO and PAUL) with resilience to noise, occlusions, and perspective distortions. All these techniques, however, have been limited by the fundamental need to establish correspondences across the 3D training data -- significantly limiting their utility to applications where one has an abundance of "in-correspondence" 3D data. Our approach harnesses the inherent permutation equivariance of transformers to manage varying number of points per 3D data instance, withstands occlusions, and generalizes to unseen categories. We demonstrate state of the art performance across 2D-3D lifting task benchmarks. Since our approach can be trained across such a broad class of structures we refer to it simply as a 3D Lifting Foundation Model (3D-LFM) -- the first of its kind.
Related papers
- Learning 3D Representations from Procedural 3D Programs [6.915871213703219]
Self-supervised learning has emerged as a promising approach for acquiring transferable 3D representations from unlabeled 3D point clouds.
We propose learning 3D representations from procedural 3D programs that automatically generate 3D shapes using simple primitives and augmentations.
arXiv Detail & Related papers (2024-11-25T18:59:57Z) - Enhancing Generalizability of Representation Learning for Data-Efficient 3D Scene Understanding [50.448520056844885]
We propose a generative Bayesian network to produce diverse synthetic scenes with real-world patterns.
A series of experiments robustly display our method's consistent superiority over existing state-of-the-art pre-training approaches.
arXiv Detail & Related papers (2024-06-17T07:43:53Z) - SUGAR: Pre-training 3D Visual Representations for Robotics [85.55534363501131]
We introduce a novel 3D pre-training framework for robotics named SUGAR.
SUGAR captures semantic, geometric and affordance properties of objects through 3D point clouds.
We show that SUGAR's 3D representation outperforms state-of-the-art 2D and 3D representations.
arXiv Detail & Related papers (2024-04-01T21:23:03Z) - PonderV2: Pave the Way for 3D Foundation Model with A Universal
Pre-training Paradigm [114.47216525866435]
We introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation.
For the first time, PonderV2 achieves state-of-the-art performance on 11 indoor and outdoor benchmarks, implying its effectiveness.
arXiv Detail & Related papers (2023-10-12T17:59:57Z) - NeurOCS: Neural NOCS Supervision for Monocular 3D Object Localization [80.3424839706698]
We present NeurOCS, a framework that uses instance masks 3D boxes as input to learn 3D object shapes by means of differentiable rendering.
Our approach rests on insights in learning a category-level shape prior directly from real driving scenes.
We make critical design choices to learn object coordinates more effectively from an object-centric view.
arXiv Detail & Related papers (2023-05-28T16:18:41Z) - Multiview Compressive Coding for 3D Reconstruction [77.95706553743626]
We introduce a simple framework that operates on 3D points of single objects or whole scenes.
Our model, Multiview Compressive Coding, learns to compress the input appearance and geometry to predict the 3D structure.
arXiv Detail & Related papers (2023-01-19T18:59:52Z) - Unsupervised Learning of Visual 3D Keypoints for Control [104.92063943162896]
Learning sensorimotor control policies from high-dimensional images crucially relies on the quality of the underlying visual representations.
We propose a framework to learn such a 3D geometric structure directly from images in an end-to-end unsupervised manner.
These discovered 3D keypoints tend to meaningfully capture robot joints as well as object movements in a consistent manner across both time and 3D space.
arXiv Detail & Related papers (2021-06-14T17:59:59Z) - Info3D: Representation Learning on 3D Objects using Mutual Information
Maximization and Contrastive Learning [8.448611728105513]
We propose to extend the InfoMax and contrastive learning principles on 3D shapes.
We show that we can maximize the mutual information between 3D objects and their "chunks" to improve the representations in aligned datasets.
arXiv Detail & Related papers (2020-06-04T00:30:26Z) - Deep NRSfM++: Towards Unsupervised 2D-3D Lifting in the Wild [44.78174845839193]
We present a strategy for improving learning-based NRSfM methods to tackle the above issues.
Our approach, Deep NRSfM++, is state-of-the-art performance across numerous large-scale benchmarks.
arXiv Detail & Related papers (2020-01-27T21:14:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.