SPARE3D: A Dataset for SPAtial REasoning on Three-View Line Drawings
- URL: http://arxiv.org/abs/2003.14034v2
- Date: Wed, 2 Sep 2020 14:18:47 GMT
- Title: SPARE3D: A Dataset for SPAtial REasoning on Three-View Line Drawings
- Authors: Wenyu Han, Siyuan Xiang, Chenhui Liu, Ruoyu Wang, Chen Feng
- Abstract summary: We present the SPARE3D dataset. Based on cognitive science and psychometrics, SPARE3D contains three types of 2D-3D reasoning tasks on view consistency, camera pose, and shape generation.
We then design a method to automatically generate a large number of challenging questions with ground truth answers for each task.
Experiments show that although convolutional networks have achieved superhuman performance in many visual learning tasks, their spatial reasoning performance on SPARE3D tasks is either lower than average human performance or even close to random guesses.
- Score: 9.651400924429336
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spatial reasoning is an important component of human intelligence. We can
imagine the shapes of 3D objects and reason about their spatial relations by
merely looking at their three-view line drawings in 2D, with different levels
of competence. Can deep networks be trained to perform spatial reasoning tasks?
How can we measure their "spatial intelligence"? To answer these questions, we
present the SPARE3D dataset. Based on cognitive science and psychometrics,
SPARE3D contains three types of 2D-3D reasoning tasks on view consistency,
camera pose, and shape generation, with increasing difficulty. We then design a
method to automatically generate a large number of challenging questions with
ground truth answers for each task. They are used to provide supervision for
training our baseline models using state-of-the-art architectures like ResNet.
Our experiments show that although convolutional networks have achieved
superhuman performance in many visual learning tasks, their spatial reasoning
performance on SPARE3D tasks is either lower than average human performance or
even close to random guesses. We hope SPARE3D can stimulate new problem
formulations and network designs for spatial reasoning to empower intelligent
robots to operate effectively in the 3D world via 2D sensors. The dataset and
code are available at https://ai4ce.github.io/SPARE3D.
Related papers
- 3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark [17.94511890272007]
3D spatial reasoning is the ability to analyze and interpret the positions, orientations, and spatial relationships of objects within the 3D space.
Large multi-modal models (LMMs) have achieved remarkable progress in a wide range of image and video understanding tasks.
We present the first comprehensive 3D spatial reasoning benchmark, 3DSRBench, with 2,772 manually annotated visual question-answer pairs.
arXiv Detail & Related papers (2024-12-10T18:55:23Z) - LLMI3D: MLLM-based 3D Perception from a Single 2D Image [77.13869413871028]
multimodal large language models (MLLMs) excel in general capacity but underperform in 3D tasks.
In this paper, we propose solutions for weak 3D local spatial object perception, poor text-based geometric numerical output, and inability to handle camera focal variations.
We employ parameter-efficient fine-tuning for a pre-trained MLLM and develop LLMI3D, a powerful 3D perception MLLM.
arXiv Detail & Related papers (2024-08-14T10:00:16Z) - The 3D-PC: a benchmark for visual perspective taking in humans and machines [11.965236208112753]
A growing number of reports have indicated that deep neural networks (DNNs) become capable of analyzing 3D scenes after training on large image datasets.
We investigated if this emergent ability for 3D analysis in DNNs is sufficient for visual perspective taking (VPT) with the 3D perception challenge (3D-PC)
The 3D-PC is comprised of three 3D-analysis tasks posed within natural scene images.
arXiv Detail & Related papers (2024-06-06T14:59:39Z) - SUGAR: Pre-training 3D Visual Representations for Robotics [85.55534363501131]
We introduce a novel 3D pre-training framework for robotics named SUGAR.
SUGAR captures semantic, geometric and affordance properties of objects through 3D point clouds.
We show that SUGAR's 3D representation outperforms state-of-the-art 2D and 3D representations.
arXiv Detail & Related papers (2024-04-01T21:23:03Z) - PonderV2: Pave the Way for 3D Foundation Model with A Universal
Pre-training Paradigm [114.47216525866435]
We introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation.
For the first time, PonderV2 achieves state-of-the-art performance on 11 indoor and outdoor benchmarks, implying its effectiveness.
arXiv Detail & Related papers (2023-10-12T17:59:57Z) - On the Efficacy of 3D Point Cloud Reinforcement Learning [20.4424883945357]
We focus on 3D point clouds, one of the most common forms of 3D representations.
We systematically investigate design choices for 3D point cloud RL, leading to the development of a robust algorithm for various robotic manipulation and control tasks.
We find that 3D point cloud RL can significantly outperform the 2D counterpart when agent-object / object-object relationship encoding is a key factor.
arXiv Detail & Related papers (2023-06-11T22:52:08Z) - CLIP$^2$: Contrastive Language-Image-Point Pretraining from Real-World
Point Cloud Data [80.42480679542697]
We propose Contrastive Language-Image-Point Cloud Pretraining (CLIP$2$) to learn the transferable 3D point cloud representation in realistic scenarios.
Specifically, we exploit naturally-existed correspondences in 2D and 3D scenarios, and build well-aligned and instance-based text-image-point proxies from those complex scenarios.
arXiv Detail & Related papers (2023-03-22T09:32:45Z) - Deep Generative Models on 3D Representations: A Survey [81.73385191402419]
Generative models aim to learn the distribution of observed data by generating new instances.
Recently, researchers started to shift focus from 2D to 3D space.
representing 3D data poses significantly greater challenges.
arXiv Detail & Related papers (2022-10-27T17:59:50Z) - Super Images -- A New 2D Perspective on 3D Medical Imaging Analysis [0.0]
We present a simple yet effective 2D method to handle 3D data while efficiently embedding the 3D knowledge during training.
Our method generates a super-resolution image by stitching slices side by side in the 3D image.
While attaining equal, if not superior, results to 3D networks utilizing only 2D counterparts, the model complexity is reduced by around threefold.
arXiv Detail & Related papers (2022-05-05T09:59:03Z) - Interactive Annotation of 3D Object Geometry using 2D Scribbles [84.51514043814066]
In this paper, we propose an interactive framework for annotating 3D object geometry from point cloud data and RGB imagery.
Our framework targets naive users without artistic or graphics expertise.
arXiv Detail & Related papers (2020-08-24T21:51:29Z) - 3D Self-Supervised Methods for Medical Imaging [7.65168530693281]
We propose 3D versions for five different self-supervised methods, in the form of proxy tasks.
Our methods facilitate neural network feature learning from unlabeled 3D images, aiming to reduce the required cost for expert annotation.
The developed algorithms are 3D Contrastive Predictive Coding, 3D Rotation prediction, 3D Jigsaw puzzles, Relative 3D patch location, and 3D Exemplar networks.
arXiv Detail & Related papers (2020-06-06T09:56:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.