Deep NRSfM++: Towards Unsupervised 2D-3D Lifting in the Wild
- URL: http://arxiv.org/abs/2001.10090v2
- Date: Wed, 31 Mar 2021 02:18:27 GMT
- Title: Deep NRSfM++: Towards Unsupervised 2D-3D Lifting in the Wild
- Authors: Chaoyang Wang and Chen-Hsuan Lin and Simon Lucey
- Abstract summary: We present a strategy for improving learning-based NRSfM methods to tackle the above issues.
Our approach, Deep NRSfM++, is state-of-the-art performance across numerous large-scale benchmarks.
- Score: 44.78174845839193
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recovery of 3D shape and pose from 2D landmarks stemming from a large
ensemble of images can be viewed as a non-rigid structure from motion (NRSfM)
problem. Classical NRSfM approaches, however, are problematic as they rely on
heuristic priors on the 3D structure (e.g. low rank) that do not scale well to
large datasets. Learning-based methods are showing the potential to reconstruct
a much broader set of 3D structures than classical methods -- dramatically
expanding the importance of NRSfM to atemporal unsupervised 2D to 3D lifting.
Hitherto, these learning approaches have not been able to effectively model
perspective cameras or handle missing/occluded points -- limiting their
applicability to in-the-wild datasets. In this paper, we present a generalized
strategy for improving learning-based NRSfM methods to tackle the above issues.
Our approach, Deep NRSfM++, achieves state-of-the-art performance across
numerous large-scale benchmarks, outperforming both classical and
learning-based 2D-3D lifting methods.
Related papers
- HiPART: Hierarchical Pose AutoRegressive Transformer for Occluded 3D Human Pose Estimation [61.32714172038278]
We propose a novel two-stage generative densification method, named Hierarchical Pose AutoRegressive Transformer (HiPART), to generate 2D dense poses from the original sparse 2D pose.
Specifically, we first develop a multi-scale skeleton tokenization module to quantize the highly dense 2D pose into hierarchical tokens and propose a Skeleton-aware Alignment to strengthen token connections.
With generated hierarchical poses as inputs for 2D-to-3D lifting, the proposed method shows strong robustness in occluded scenarios and achieves state-of-the-art performance on the single-frame-based 3D
arXiv Detail & Related papers (2025-03-30T06:15:36Z) - DINO in the Room: Leveraging 2D Foundation Models for 3D Segmentation [51.43837087865105]
Vision foundation models (VFMs) trained on large-scale image datasets provide high-quality features that have significantly advanced 2D visual recognition.
Their potential in 3D vision remains largely untapped, despite the common availability of 2D images alongside 3D point cloud datasets.
We introduce DITR, a simple yet effective approach that extracts 2D foundation model features, projects them to 3D, and finally injects them into a 3D point cloud segmentation model.
arXiv Detail & Related papers (2025-03-24T17:59:11Z) - Learning A Zero-shot Occupancy Network from Vision Foundation Models via Self-supervised Adaptation [41.98740330990215]
This work proposes a novel approach that bridges 2D vision foundation models with 3D tasks.
We leverage the zero-shot capabilities of vision-language models for image semantics.
We project the semantics into 3D space using the reconstructed metric depth, thereby providing 3D supervision.
arXiv Detail & Related papers (2025-03-10T09:54:40Z) - FILP-3D: Enhancing 3D Few-shot Class-incremental Learning with
Pre-trained Vision-Language Models [62.663113296987085]
Few-shot class-incremental learning aims to mitigate the catastrophic forgetting issue when a model is incrementally trained on limited data.
We introduce two novel components: the Redundant Feature Eliminator (RFE) and the Spatial Noise Compensator (SNC)
Considering the imbalance in existing 3D datasets, we also propose new evaluation metrics that offer a more nuanced assessment of a 3D FSCIL model.
arXiv Detail & Related papers (2023-12-28T14:52:07Z) - 3D-LFM: Lifting Foundation Model [29.48835001900286]
deep learning has expanded our capability to reconstruct a wide range of object classes.
Our approach harnesses the inherent permutation equivariance transformers to manage varying number points per 3D data instance.
We demonstrate state the art performance across 2D-3D lifting task benchmarks.
arXiv Detail & Related papers (2023-12-19T06:38:18Z) - FrozenRecon: Pose-free 3D Scene Reconstruction with Frozen Depth Models [67.96827539201071]
We propose a novel test-time optimization approach for 3D scene reconstruction.
Our method achieves state-of-the-art cross-dataset reconstruction on five zero-shot testing datasets.
arXiv Detail & Related papers (2023-08-10T17:55:02Z) - Weakly-supervised Pre-training for 3D Human Pose Estimation via
Perspective Knowledge [36.65402869749077]
We propose a novel method to extract weak 3D information directly from 2D images without 3D pose supervision.
We propose a weakly-supervised pre-training (WSP) strategy to distinguish the depth relationship between two points in an image.
WSP achieves state-of-the-art results on two widely-used benchmarks.
arXiv Detail & Related papers (2022-11-22T03:35:15Z) - Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection [70.71934539556916]
We learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection.
Specifically, a principled geometry formula with projective modeling of 2D and 3D depth predictions in the monocular 3D object detection network is devised.
Our method remarkably improves the detection performance of the state-of-the-art monocular-based method without extra data by 2.80% on the moderate test setting.
arXiv Detail & Related papers (2021-07-29T12:30:39Z) - GDRNPP: A Geometry-guided and Fully Learning-based Object Pose Estimator [51.89441403642665]
6D pose estimation of rigid objects is a long-standing and challenging task in computer vision.
Recently, the emergence of deep learning reveals the potential of Convolutional Neural Networks (CNNs) to predict reliable 6D poses.
This paper introduces a fully learning-based object pose estimator.
arXiv Detail & Related papers (2021-02-24T09:11:31Z) - Secrets of 3D Implicit Object Shape Reconstruction in the Wild [92.5554695397653]
Reconstructing high-fidelity 3D objects from sparse, partial observation is crucial for various applications in computer vision, robotics, and graphics.
Recent neural implicit modeling methods show promising results on synthetic or dense datasets.
But, they perform poorly on real-world data that is sparse and noisy.
This paper analyzes the root cause of such deficient performance of a popular neural implicit model.
arXiv Detail & Related papers (2021-01-18T03:24:48Z) - Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image.
The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images.
We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z) - Procrustean Regression Networks: Learning 3D Structure of Non-Rigid
Objects from 2D Annotations [42.476537776831314]
We propose a novel framework for training neural networks which is capable of learning 3D information of non-rigid objects.
The proposed framework shows superior reconstruction performance to the state-of-the-art method on the Human 3.6M, 300-VW, and SURREAL datasets.
arXiv Detail & Related papers (2020-07-21T17:29:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.