SPARE3D: A Dataset for SPAtial REasoning on Three-View Line Drawings
- URL: http://arxiv.org/abs/2003.14034v2
- Date: Wed, 2 Sep 2020 14:18:47 GMT
- Title: SPARE3D: A Dataset for SPAtial REasoning on Three-View Line Drawings
- Authors: Wenyu Han, Siyuan Xiang, Chenhui Liu, Ruoyu Wang, Chen Feng
- Abstract summary: We present the SPARE3D dataset. Based on cognitive science and psychometrics, SPARE3D contains three types of 2D-3D reasoning tasks on view consistency, camera pose, and shape generation.
We then design a method to automatically generate a large number of challenging questions with ground truth answers for each task.
Experiments show that although convolutional networks have achieved superhuman performance in many visual learning tasks, their spatial reasoning performance on SPARE3D tasks is either lower than average human performance or even close to random guesses.
- Score: 9.651400924429336
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spatial reasoning is an important component of human intelligence. We can
imagine the shapes of 3D objects and reason about their spatial relations by
merely looking at their three-view line drawings in 2D, with different levels
of competence. Can deep networks be trained to perform spatial reasoning tasks?
How can we measure their "spatial intelligence"? To answer these questions, we
present the SPARE3D dataset. Based on cognitive science and psychometrics,
SPARE3D contains three types of 2D-3D reasoning tasks on view consistency,
camera pose, and shape generation, with increasing difficulty. We then design a
method to automatically generate a large number of challenging questions with
ground truth answers for each task. They are used to provide supervision for
training our baseline models using state-of-the-art architectures like ResNet.
Our experiments show that although convolutional networks have achieved
superhuman performance in many visual learning tasks, their spatial reasoning
performance on SPARE3D tasks is either lower than average human performance or
even close to random guesses. We hope SPARE3D can stimulate new problem
formulations and network designs for spatial reasoning to empower intelligent
robots to operate effectively in the 3D world via 2D sensors. The dataset and
code are available at https://ai4ce.github.io/SPARE3D.
Related papers
- Implicit-Zoo: A Large-Scale Dataset of Neural Implicit Functions for 2D Images and 3D Scenes [65.22070581594426]
"Implicit-Zoo" is a large-scale dataset requiring thousands of GPU training days to facilitate research and development in this field.
We showcase two immediate benefits as it enables to: (1) learn token locations for transformer models; (2) directly regress 3D cameras poses of 2D images with respect to NeRF models.
This in turn leads to an improved performance in all three task of image classification, semantic segmentation, and 3D pose regression, thereby unlocking new avenues for research.
arXiv Detail & Related papers (2024-06-25T10:20:44Z) - The 3D-PC: a benchmark for visual perspective taking in humans and machines [11.965236208112753]
A growing number of reports have indicated that deep neural networks (DNNs) become capable of analyzing 3D scenes after training on large image datasets.
We investigated if this emergent ability for 3D analysis in DNNs is sufficient for visual perspective taking (VPT) with the 3D perception challenge (3D-PC)
The 3D-PC is comprised of three 3D-analysis tasks posed within natural scene images.
arXiv Detail & Related papers (2024-06-06T14:59:39Z) - SUGAR: Pre-training 3D Visual Representations for Robotics [85.55534363501131]
We introduce a novel 3D pre-training framework for robotics named SUGAR.
SUGAR captures semantic, geometric and affordance properties of objects through 3D point clouds.
We show that SUGAR's 3D representation outperforms state-of-the-art 2D and 3D representations.
arXiv Detail & Related papers (2024-04-01T21:23:03Z) - PonderV2: Pave the Way for 3D Foundation Model with A Universal
Pre-training Paradigm [114.47216525866435]
We introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation.
For the first time, PonderV2 achieves state-of-the-art performance on 11 indoor and outdoor benchmarks, implying its effectiveness.
arXiv Detail & Related papers (2023-10-12T17:59:57Z) - On the Efficacy of 3D Point Cloud Reinforcement Learning [20.4424883945357]
We focus on 3D point clouds, one of the most common forms of 3D representations.
We systematically investigate design choices for 3D point cloud RL, leading to the development of a robust algorithm for various robotic manipulation and control tasks.
We find that 3D point cloud RL can significantly outperform the 2D counterpart when agent-object / object-object relationship encoding is a key factor.
arXiv Detail & Related papers (2023-06-11T22:52:08Z) - CLIP$^2$: Contrastive Language-Image-Point Pretraining from Real-World
Point Cloud Data [80.42480679542697]
We propose Contrastive Language-Image-Point Cloud Pretraining (CLIP$2$) to learn the transferable 3D point cloud representation in realistic scenarios.
Specifically, we exploit naturally-existed correspondences in 2D and 3D scenarios, and build well-aligned and instance-based text-image-point proxies from those complex scenarios.
arXiv Detail & Related papers (2023-03-22T09:32:45Z) - Deep Generative Models on 3D Representations: A Survey [81.73385191402419]
Generative models aim to learn the distribution of observed data by generating new instances.
Recently, researchers started to shift focus from 2D to 3D space.
representing 3D data poses significantly greater challenges.
arXiv Detail & Related papers (2022-10-27T17:59:50Z) - Decanus to Legatus: Synthetic training for 2D-3D human pose lifting [26.108023246654646]
We propose an algorithm to generate infinite 3D synthetic human poses (Legatus) from a 3D pose distribution based on 10 initial handcrafted 3D poses (Decanus)
Our results show that we can achieve 3D pose estimation performance comparable to methods using real data from specialized datasets but in a zero-shot setup, showing the potential of our framework.
arXiv Detail & Related papers (2022-10-05T13:10:19Z) - Super Images -- A New 2D Perspective on 3D Medical Imaging Analysis [0.0]
We present a simple yet effective 2D method to handle 3D data while efficiently embedding the 3D knowledge during training.
Our method generates a super-resolution image by stitching slices side by side in the 3D image.
While attaining equal, if not superior, results to 3D networks utilizing only 2D counterparts, the model complexity is reduced by around threefold.
arXiv Detail & Related papers (2022-05-05T09:59:03Z) - Interactive Annotation of 3D Object Geometry using 2D Scribbles [84.51514043814066]
In this paper, we propose an interactive framework for annotating 3D object geometry from point cloud data and RGB imagery.
Our framework targets naive users without artistic or graphics expertise.
arXiv Detail & Related papers (2020-08-24T21:51:29Z) - 3D Self-Supervised Methods for Medical Imaging [7.65168530693281]
We propose 3D versions for five different self-supervised methods, in the form of proxy tasks.
Our methods facilitate neural network feature learning from unlabeled 3D images, aiming to reduce the required cost for expert annotation.
The developed algorithms are 3D Contrastive Predictive Coding, 3D Rotation prediction, 3D Jigsaw puzzles, Relative 3D patch location, and 3D Exemplar networks.
arXiv Detail & Related papers (2020-06-06T09:56:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.