Towards Learning Monocular 3D Object Localization From 2D Labels using
the Physical Laws of Motion
- URL: http://arxiv.org/abs/2310.17462v2
- Date: Wed, 29 Nov 2023 14:33:28 GMT
- Title: Towards Learning Monocular 3D Object Localization From 2D Labels using
the Physical Laws of Motion
- Authors: Daniel Kienzle, Julian Lorenz, Katja Ludwig, Rainer Lienhart
- Abstract summary: We present a novel method for precise 3D object localization in single images from a single calibrated camera using only 2D labels.
Instead of using 3D labels, our model is trained with easy-to-annotate 2D labels along with the physical knowledge of the object's motion.
- Score: 15.15687944002438
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a novel method for precise 3D object localization in single images
from a single calibrated camera using only 2D labels. No expensive 3D labels
are needed. Thus, instead of using 3D labels, our model is trained with
easy-to-annotate 2D labels along with the physical knowledge of the object's
motion. Given this information, the model can infer the latent third dimension,
even though it has never seen this information during training. Our method is
evaluated on both synthetic and real-world datasets, and we are able to achieve
a mean distance error of just 6 cm in our experiments on real data. The results
indicate the method's potential as a step towards learning 3D object location
estimation, where collecting 3D data for training is not feasible.
Related papers
- Cross-Dimensional Medical Self-Supervised Representation Learning Based on a Pseudo-3D Transformation [68.60747298865394]
We propose a new cross-dimensional SSL framework based on a pseudo-3D transformation (CDSSL-P3D)
Specifically, we introduce an image transformation based on the im2col algorithm, which converts 2D images into a format consistent with 3D data.
This transformation enables seamless integration of 2D and 3D data, and facilitates cross-dimensional self-supervised learning for 3D medical image analysis.
arXiv Detail & Related papers (2024-06-03T02:57:25Z) - Weakly Supervised 3D Object Detection via Multi-Level Visual Guidance [72.6809373191638]
We propose a framework to study how to leverage constraints between 2D and 3D domains without requiring any 3D labels.
Specifically, we design a feature-level constraint to align LiDAR and image features based on object-aware regions.
Second, the output-level constraint is developed to enforce the overlap between 2D and projected 3D box estimations.
Third, the training-level constraint is utilized by producing accurate and consistent 3D pseudo-labels that align with the visual data.
arXiv Detail & Related papers (2023-12-12T18:57:25Z) - Tracking Objects with 3D Representation from Videos [57.641129788552675]
We propose a new 2D Multiple Object Tracking paradigm, called P3DTrack.
With 3D object representation learning from Pseudo 3D object labels in monocular videos, we propose a new 2D MOT paradigm, called P3DTrack.
arXiv Detail & Related papers (2023-06-08T17:58:45Z) - Data Efficient 3D Learner via Knowledge Transferred from 2D Model [30.077342050473515]
We deal with the data scarcity challenge of 3D tasks by transferring knowledge from strong 2D models via RGB-D images.
We utilize a strong and well-trained semantic segmentation model for 2D images to augment RGB-D images with pseudo-label.
Our method already outperforms existing state-of-the-art that is tailored for 3D label efficiency.
arXiv Detail & Related papers (2022-03-16T09:14:44Z) - 3D object reconstruction and 6D-pose estimation from 2D shape for
robotic grasping of objects [2.330913682033217]
We propose a method for 3D object reconstruction and 6D-pose estimation from 2D images.
By computing transformation parameters directly from the 2D images, the number of free parameters required during the registration process is reduced.
In robot experiments, successful grasping of objects demonstrates its usability in real-world environments.
arXiv Detail & Related papers (2022-03-02T11:58:35Z) - SAT: 2D Semantics Assisted Training for 3D Visual Grounding [95.84637054325039]
3D visual grounding aims at grounding a natural language description about a 3D scene, usually represented in the form of 3D point clouds, to the targeted object region.
Point clouds are sparse, noisy, and contain limited semantic information compared with 2D images.
We propose 2D Semantics Assisted Training (SAT) that utilizes 2D image semantics in the training stage to ease point-cloud-language joint representation learning.
arXiv Detail & Related papers (2021-05-24T17:58:36Z) - FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection [78.00922683083776]
It is non-trivial to make a general adapted 2D detector work in this 3D task.
In this technical report, we study this problem with a practice built on fully convolutional single-stage detector.
Our solution achieves 1st place out of all the vision-only methods in the nuScenes 3D detection challenge of NeurIPS 2020.
arXiv Detail & Related papers (2021-04-22T09:35:35Z) - Learning from 2D: Pixel-to-Point Knowledge Transfer for 3D Pretraining [21.878815180924832]
We present a novel 3D pretraining method by leveraging 2D networks learned from rich 2D datasets.
Our experiments show that the 3D models pretrained with 2D knowledge boost the performances across various real-world 3D downstream tasks.
arXiv Detail & Related papers (2021-04-10T05:40:42Z) - 3D Registration for Self-Occluded Objects in Context [66.41922513553367]
We introduce the first deep learning framework capable of effectively handling this scenario.
Our method consists of an instance segmentation module followed by a pose estimation one.
It allows us to perform 3D registration in a one-shot manner, without requiring an expensive iterative procedure.
arXiv Detail & Related papers (2020-11-23T08:05:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.