Multi-Modal Dataset Acquisition for Photometrically Challenging Object
- URL: http://arxiv.org/abs/2308.10621v1
- Date: Mon, 21 Aug 2023 10:38:32 GMT
- Title: Multi-Modal Dataset Acquisition for Photometrically Challenging Object
- Authors: HyunJun Jung, Patrick Ruhkamp, Nassir Navab, Benjamin Busam
- Abstract summary: This paper addresses the limitations of current datasets for 3D vision tasks in terms of accuracy, size, realism, and suitable imaging modalities for photometrically challenging objects.
We propose a novel annotation and acquisition pipeline that enhances existing 3D perception and 6D object pose datasets.
- Score: 56.30027922063559
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper addresses the limitations of current datasets for 3D vision tasks
in terms of accuracy, size, realism, and suitable imaging modalities for
photometrically challenging objects. We propose a novel annotation and
acquisition pipeline that enhances existing 3D perception and 6D object pose
datasets. Our approach integrates robotic forward-kinematics, external infrared
trackers, and improved calibration and annotation procedures. We present a
multi-modal sensor rig, mounted on a robotic end-effector, and demonstrate how
it is integrated into the creation of highly accurate datasets. Additionally,
we introduce a freehand procedure for wider viewpoint coverage. Both approaches
yield high-quality 3D data with accurate object and camera pose annotations.
Our methods overcome the limitations of existing datasets and provide valuable
resources for 3D vision research.
Related papers
- Implicit-Zoo: A Large-Scale Dataset of Neural Implicit Functions for 2D Images and 3D Scenes [65.22070581594426]
"Implicit-Zoo" is a large-scale dataset requiring thousands of GPU training days to facilitate research and development in this field.
We showcase two immediate benefits as it enables to: (1) learn token locations for transformer models; (2) directly regress 3D cameras poses of 2D images with respect to NeRF models.
This in turn leads to an improved performance in all three task of image classification, semantic segmentation, and 3D pose regression, thereby unlocking new avenues for research.
arXiv Detail & Related papers (2024-06-25T10:20:44Z) - 3DiffTection: 3D Object Detection with Geometry-Aware Diffusion Features [70.50665869806188]
3DiffTection is a state-of-the-art method for 3D object detection from single images.
We fine-tune a diffusion model to perform novel view synthesis conditioned on a single image.
We further train the model on target data with detection supervision.
arXiv Detail & Related papers (2023-11-07T23:46:41Z) - Perspective-aware Convolution for Monocular 3D Object Detection [2.33877878310217]
We propose a novel perspective-aware convolutional layer that captures long-range dependencies in images.
By enforcing convolutional kernels to extract features along the depth axis of every image pixel, we incorporates perspective information into network architecture.
We demonstrate improved performance on the KITTI3D dataset, achieving a 23.9% average precision in the easy benchmark.
arXiv Detail & Related papers (2023-08-24T17:25:36Z) - Weakly Supervised Multi-Modal 3D Human Body Pose Estimation for
Autonomous Driving [0.5735035463793008]
3D human pose estimation is crucial for enabling autonomous vehicles (AVs) to make informed decisions and respond proactively in critical road scenarios.
We present a simple yet efficient weakly supervised approach for 3D HPE in the AV context by employing a high-level sensor fusion between camera and LiDAR data.
Our approach outperforms state-of-the-art results by up to $sim$ 13% on the Open dataset in the weakly supervised setting.
arXiv Detail & Related papers (2023-07-27T14:28:50Z) - AutoDecoding Latent 3D Diffusion Models [95.7279510847827]
We present a novel approach to the generation of static and articulated 3D assets that has a 3D autodecoder at its core.
The 3D autodecoder framework embeds properties learned from the target dataset in the latent space.
We then identify the appropriate intermediate volumetric latent space, and introduce robust normalization and de-normalization operations.
arXiv Detail & Related papers (2023-07-07T17:59:14Z) - 3D Data Augmentation for Driving Scenes on Camera [50.41413053812315]
We propose a 3D data augmentation approach termed Drive-3DAug, aiming at augmenting the driving scenes on camera in the 3D space.
We first utilize Neural Radiance Field (NeRF) to reconstruct the 3D models of background and foreground objects.
Then, augmented driving scenes can be obtained by placing the 3D objects with adapted location and orientation at the pre-defined valid region of backgrounds.
arXiv Detail & Related papers (2023-03-18T05:51:05Z) - DETR4D: Direct Multi-View 3D Object Detection with Sparse Attention [50.11672196146829]
3D object detection with surround-view images is an essential task for autonomous driving.
We propose DETR4D, a Transformer-based framework that explores sparse attention and direct feature query for 3D object detection in multi-view images.
arXiv Detail & Related papers (2022-12-15T14:18:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.