Hierarchical Abstraction Enables Human-Like 3D Object Recognition in Deep Learning Models
- URL: http://arxiv.org/abs/2507.09830v1
- Date: Sun, 13 Jul 2025 23:54:45 GMT
- Title: Hierarchical Abstraction Enables Human-Like 3D Object Recognition in Deep Learning Models
- Authors: Shuhao Fu, Philip J. Kellman, Hongjing Lu,
- Abstract summary: Both humans and deep learning models can recognize objects from 3D shapes depicted with sparse visual information.<n>It remains unclear whether these models develop 3D shape representations similar to those used by human vision for object recognition.
- Score: 1.7341654854802664
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Both humans and deep learning models can recognize objects from 3D shapes depicted with sparse visual information, such as a set of points randomly sampled from the surfaces of 3D objects (termed a point cloud). Although deep learning models achieve human-like performance in recognizing objects from 3D shapes, it remains unclear whether these models develop 3D shape representations similar to those used by human vision for object recognition. We hypothesize that training with 3D shapes enables models to form representations of local geometric structures in 3D shapes. However, their representations of global 3D object shapes may be limited. We conducted two human experiments systematically manipulating point density and object orientation (Experiment 1), and local geometric structure (Experiment 2). Humans consistently performed well across all experimental conditions. We compared two types of deep learning models, one based on a convolutional neural network (DGCNN) and the other on visual transformers (point transformer), with human performance. We found that the point transformer model provided a better account of human performance than the convolution-based model. The advantage mainly results from the mechanism in the point transformer model that supports hierarchical abstraction of 3D shapes.
Related papers
- Common3D: Self-Supervised Learning of 3D Morphable Models for Common Objects in Neural Feature Space [58.623106094568776]
3D morphable models (3DMMs) are a powerful tool to represent the possible shapes and appearances of an object category.<n>We introduce a new method, Common3D, that learns 3DMMs of common objects in a fully self-supervised manner from a collection of object-centric videos.<n>Common3D is the first completely self-supervised method that can solve various vision tasks in a zero-shot manner.
arXiv Detail & Related papers (2025-04-30T15:42:23Z) - Learning Internal Representations of 3D Transformations from 2D
Projected Inputs [13.029330360766595]
We show how our model infers depth from moving 2D projected points, learns 3D rotational transformations from 2D training stimuli, and compares to human performance on psychophysical structure-from-motion experiments.
arXiv Detail & Related papers (2023-03-31T02:43:01Z) - MvDeCor: Multi-view Dense Correspondence Learning for Fine-grained 3D
Segmentation [91.6658845016214]
We propose to utilize self-supervised techniques in the 2D domain for fine-grained 3D shape segmentation tasks.
We render a 3D shape from multiple views, and set up a dense correspondence learning task within the contrastive learning framework.
As a result, the learned 2D representations are view-invariant and geometrically consistent.
arXiv Detail & Related papers (2022-08-18T00:48:15Z) - Learning to Generate 3D Shapes from a Single Example [28.707149807472685]
We present a multi-scale GAN-based model designed to capture the input shape's geometric features across a range of spatial scales.
We train our generative model on a voxel pyramid of the reference shape, without the need of any external supervision or manual annotation.
The resulting shapes present variations across different scales, and at the same time retain the global structure of the reference shape.
arXiv Detail & Related papers (2022-08-05T01:05:32Z) - Disentangled3D: Learning a 3D Generative Model with Disentangled
Geometry and Appearance from Monocular Images [94.49117671450531]
State-of-the-art 3D generative models are GANs which use neural 3D volumetric representations for synthesis.
In this paper, we design a 3D GAN which can learn a disentangled model of objects, just from monocular observations.
arXiv Detail & Related papers (2022-03-29T22:03:18Z) - AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection [15.244852122106634]
We propose an approach for incorporating the shape-aware 2D/3D constraints into the 3D detection framework.
Specifically, we employ the deep neural network to learn distinguished 2D keypoints in the 2D image domain.
For generating the ground truth of 2D/3D keypoints, an automatic model-fitting approach has been proposed.
arXiv Detail & Related papers (2021-08-25T08:50:06Z) - Generative VoxelNet: Learning Energy-Based Models for 3D Shape Synthesis
and Analysis [143.22192229456306]
This paper proposes a deep 3D energy-based model to represent volumetric shapes.
The benefits of the proposed model are six-fold.
Experiments demonstrate that the proposed model can generate high-quality 3D shape patterns.
arXiv Detail & Related papers (2020-12-25T06:09:36Z) - Canonical 3D Deformer Maps: Unifying parametric and non-parametric
methods for dense weakly-supervised category reconstruction [79.98689027127855]
We propose a new representation of the 3D shape of common object categories that can be learned from a collection of 2D images of independent objects.
Our method builds in a novel way on concepts from parametric deformation models, non-parametric 3D reconstruction, and canonical embeddings.
It achieves state-of-the-art results in dense 3D reconstruction on public in-the-wild datasets of faces, cars, and birds.
arXiv Detail & Related papers (2020-08-28T15:44:05Z) - Shape Prior Deformation for Categorical 6D Object Pose and Size
Estimation [62.618227434286]
We present a novel learning approach to recover the 6D poses and sizes of unseen object instances from an RGB-D image.
We propose a deep network to reconstruct the 3D object model by explicitly modeling the deformation from a pre-learned categorical shape prior.
arXiv Detail & Related papers (2020-07-16T16:45:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.