VCVW-3D: A Virtual Construction Vehicles and Workers Dataset with 3D
Annotations
- URL: http://arxiv.org/abs/2305.17927v1
- Date: Mon, 29 May 2023 07:42:10 GMT
- Title: VCVW-3D: A Virtual Construction Vehicles and Workers Dataset with 3D
Annotations
- Authors: Yuexiong Ding, Xiaowei Luo
- Abstract summary: This study creates and releases a virtual dataset with 3D annotations named VCVW-3D.
The dataset is characterized by multi-scene, multi-category, multi-viewpoint, multi-annotation, and binocular vision.
Several typical 2D and monocular 3D object detection models are then trained and evaluated on the VCVW-3D dataset.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Currently, object detection applications in construction are almost based on
pure 2D data (both image and annotation are 2D-based), resulting in the
developed artificial intelligence (AI) applications only applicable to some
scenarios that only require 2D information. However, most advanced applications
usually require AI agents to perceive 3D spatial information, which limits the
further development of the current computer vision (CV) in construction. The
lack of 3D annotated datasets for construction object detection worsens the
situation. Therefore, this study creates and releases a virtual dataset with 3D
annotations named VCVW-3D, which covers 15 construction scenes and involves ten
categories of construction vehicles and workers. The VCVW-3D dataset is
characterized by multi-scene, multi-category, multi-randomness,
multi-viewpoint, multi-annotation, and binocular vision. Several typical 2D and
monocular 3D object detection models are then trained and evaluated on the
VCVW-3D dataset to provide a benchmark for subsequent research. The VCVW-3D is
expected to bring considerable economic benefits and practical significance by
reducing the costs of data construction, prototype development, and exploration
of space-awareness applications, thus promoting the development of CV in
construction, especially those of 3D applications.
Related papers
- Volumetric Environment Representation for Vision-Language Navigation [66.04379819772764]
Vision-language navigation (VLN) requires an agent to navigate through a 3D environment based on visual observations and natural language instructions.
We introduce a Volumetric Environment Representation (VER), which voxelizes the physical world into structured 3D cells.
VER predicts 3D occupancy, 3D room layout, and 3D bounding boxes jointly.
arXiv Detail & Related papers (2024-03-21T06:14:46Z) - DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance
Fields [73.97131748433212]
This paper introduces a novel approach capable of generating infinite, high-quality 3D-consistent 2D annotations alongside 3D point cloud segmentations.
We leverage the strong semantic prior within a 3D generative model to train a semantic decoder.
Once trained, the decoder efficiently generalizes across the latent space, enabling the generation of infinite data.
arXiv Detail & Related papers (2023-11-18T21:58:28Z) - An Embodied Generalist Agent in 3D World [67.16935110789528]
We introduce LEO, an embodied multi-modal generalist agent that excels in perceiving, grounding, reasoning, planning, and acting in the 3D world.
We collect large-scale datasets comprising diverse object-level and scene-level tasks, which require considerable understanding of and interaction with the 3D world.
Through extensive experiments, we demonstrate LEO's remarkable proficiency across a wide spectrum of tasks, including 3D captioning, question answering, embodied reasoning, navigation and manipulation.
arXiv Detail & Related papers (2023-11-18T01:21:38Z) - RenderOcc: Vision-Centric 3D Occupancy Prediction with 2D Rendering
Supervision [36.15913507034939]
We present RenderOcc, a novel paradigm for training 3D occupancy models only using 2D labels.
Specifically, we extract a NeRF-style 3D volume representation from multi-view images.
We employ volume rendering techniques to establish 2D renderings, thus enabling direct 3D supervision from 2D semantics and depth labels.
arXiv Detail & Related papers (2023-09-18T06:08:15Z) - MobileBrick: Building LEGO for 3D Reconstruction on Mobile Devices [78.20154723650333]
High-quality 3D ground-truth shapes are critical for 3D object reconstruction evaluation.
We introduce a novel multi-view RGBD dataset captured using a mobile device.
We obtain precise 3D ground-truth shape without relying on high-end 3D scanners.
arXiv Detail & Related papers (2023-03-03T14:02:50Z) - OmniObject3D: Large-Vocabulary 3D Object Dataset for Realistic
Perception, Reconstruction and Generation [107.71752592196138]
We propose OmniObject3D, a large vocabulary 3D object dataset with massive high-quality real-scanned 3D objects.
It comprises 6,000 scanned objects in 190 daily categories, sharing common classes with popular 2D datasets.
Each 3D object is captured with both 2D and 3D sensors, providing textured meshes, point clouds, multiview rendered images, and multiple real-captured videos.
arXiv Detail & Related papers (2023-01-18T18:14:18Z) - DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative
3D Object Detection [8.681912341444901]
DAIR-V2X is the first large-scale, multi-modality, multi-view dataset from real scenarios for Vehicle-Infrastructure Cooperative Autonomous Driving.
DAIR-V2X comprises 71254 LiDAR frames and 71254 Camera frames, and all frames are captured from real scenes with 3D annotations.
arXiv Detail & Related papers (2022-04-12T07:13:33Z) - PC-DAN: Point Cloud based Deep Affinity Network for 3D Multi-Object
Tracking (Accepted as an extended abstract in JRDB-ACT Workshop at CVPR21) [68.12101204123422]
A point cloud is a dense compilation of spatial data in 3D coordinates.
We propose a PointNet-based approach for 3D Multi-Object Tracking (MOT)
arXiv Detail & Related papers (2021-06-03T05:36:39Z) - A Convolutional Architecture for 3D Model Embedding [1.3858051019755282]
We propose a deep learning architecture to handle 3D models as an input.
We show that the embedding representation conveys semantic information that helps to deal with the similarity assessment of 3D objects.
arXiv Detail & Related papers (2021-03-05T15:46:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.