Approaching human 3D shape perception with neurally mappable models
- URL: http://arxiv.org/abs/2308.11300v2
- Date: Thu, 7 Sep 2023 21:18:15 GMT
- Title: Approaching human 3D shape perception with neurally mappable models
- Authors: Thomas P. O'Connell, Tyler Bonnen, Yoni Friedman, Ayush Tewari, Josh
B. Tenenbaum, Vincent Sitzmann, Nancy Kanwisher
- Abstract summary: Humans effortlessly infer the 3D shape of objects.
None of current computational models capture the human ability to match object shape across viewpoints.
This work provides a foundation for understanding human shape inferences within neurally mappable computational architectures.
- Score: 15.090436065092716
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Humans effortlessly infer the 3D shape of objects. What computations underlie
this ability? Although various computational models have been proposed, none of
them capture the human ability to match object shape across viewpoints. Here,
we ask whether and how this gap might be closed. We begin with a relatively
novel class of computational models, 3D neural fields, which encapsulate the
basic principles of classic analysis-by-synthesis in a deep neural network
(DNN). First, we find that a 3D Light Field Network (3D-LFN) supports 3D
matching judgments well aligned to humans for within-category comparisons,
adversarially-defined comparisons that accentuate the 3D failure cases of
standard DNN models, and adversarially-defined comparisons for algorithmically
generated shapes with no category structure. We then investigate the source of
the 3D-LFN's ability to achieve human-aligned performance through a series of
computational experiments. Exposure to multiple viewpoints of objects during
training and a multi-view learning objective are the primary factors behind
model-human alignment; even conventional DNN architectures come much closer to
human behavior when trained with multi-view objectives. Finally, we find that
while the models trained with multi-view learning objectives are able to
partially generalize to new object categories, they fall short of human
alignment. This work provides a foundation for understanding human shape
inferences within neurally mappable computational architectures.
Related papers
- Cross-view and Cross-pose Completion for 3D Human Understanding [22.787947086152315]
We propose a pre-training approach based on self-supervised learning that works on human-centric data using only images.
We pre-train a model for body-centric tasks and one for hand-centric tasks.
With a generic transformer architecture, these models outperform existing self-supervised pre-training methods on a wide set of human-centric downstream tasks.
arXiv Detail & Related papers (2023-11-15T16:51:18Z) - PonderV2: Pave the Way for 3D Foundation Model with A Universal
Pre-training Paradigm [114.47216525866435]
We introduce a novel universal 3D pre-training framework designed to facilitate the acquisition of efficient 3D representation.
For the first time, PonderV2 achieves state-of-the-art performance on 11 indoor and outdoor benchmarks, implying its effectiveness.
arXiv Detail & Related papers (2023-10-12T17:59:57Z) - Evaluating alignment between humans and neural network representations in image-based learning tasks [5.657101730705275]
We tested how well the representations of $86$ pretrained neural network models mapped to human learning trajectories.
We found that while training dataset size was a core determinant of alignment with human choices, contrastive training with multi-modal data (text and imagery) was a common feature of currently publicly available models that predicted human generalisation.
In conclusion, pretrained neural networks can serve to extract representations for cognitive models, as they appear to capture some fundamental aspects of cognition that are transferable across tasks.
arXiv Detail & Related papers (2023-06-15T08:18:29Z) - ULIP: Learning a Unified Representation of Language, Images, and Point
Clouds for 3D Understanding [110.07170245531464]
Current 3D models are limited by datasets with a small number of annotated data and a pre-defined set of categories.
Recent advances have shown that similar problems can be significantly alleviated by employing knowledge from other modalities, such as language.
We learn a unified representation of images, texts, and 3D point clouds by pre-training with object triplets from the three modalities.
arXiv Detail & Related papers (2022-12-10T01:34:47Z) - Multi-NeuS: 3D Head Portraits from Single Image with Neural Implicit
Functions [70.04394678730968]
We present an approach for the reconstruction of 3D human heads from one or few views.
The underlying neural architecture is to learn the objects and to generalize the model.
Our model can fit novel heads on just a hundred videos or one-shot 3D scans.
arXiv Detail & Related papers (2022-09-07T21:09:24Z) - LatentHuman: Shape-and-Pose Disentangled Latent Representation for Human
Bodies [78.17425779503047]
We propose a novel neural implicit representation for the human body.
It is fully differentiable and optimizable with disentangled shape and pose latent spaces.
Our model can be trained and fine-tuned directly on non-watertight raw data with well-designed losses.
arXiv Detail & Related papers (2021-11-30T04:10:57Z) - Learning Temporal 3D Human Pose Estimation with Pseudo-Labels [3.0954251281114513]
We present a simple, yet effective, approach for self-supervised 3D human pose estimation.
We rely on triangulating 2D body pose estimates of a multiple-view camera system.
Our method achieves state-of-the-art performance in the Human3.6M and MPI-INF-3DHP benchmarks.
arXiv Detail & Related papers (2021-10-14T17:40:45Z) - Scene Synthesis via Uncertainty-Driven Attribute Synchronization [52.31834816911887]
This paper introduces a novel neural scene synthesis approach that can capture diverse feature patterns of 3D scenes.
Our method combines the strength of both neural network-based and conventional scene synthesis approaches.
arXiv Detail & Related papers (2021-08-30T19:45:07Z) - Self-Supervised 3D Human Pose Estimation with Multiple-View Geometry [2.7541825072548805]
We present a self-supervised learning algorithm for 3D human pose estimation of a single person based on a multiple-view camera system.
We propose a four-loss function learning algorithm, which does not require any 2D or 3D body pose ground-truth.
arXiv Detail & Related papers (2021-08-17T17:31:24Z) - Learning Transferable Kinematic Dictionary for 3D Human Pose and Shape
Reconstruction [15.586347115568973]
We propose a kinematic dictionary, which explicitly regularizes the solution space of relative 3D rotations of human joints.
Our method achieves end-to-end 3D reconstruction without the need of using any shape annotations during the training of neural networks.
The proposed method achieves competitive results on large-scale datasets including Human3.6M, MPI-INF-3DHP, and LSP.
arXiv Detail & Related papers (2021-04-02T09:24:29Z) - Neural Descent for Visual 3D Human Pose and Shape [67.01050349629053]
We present deep neural network methodology to reconstruct the 3d pose and shape of people, given an input RGB image.
We rely on a recently introduced, expressivefull body statistical 3d human model, GHUM, trained end-to-end.
Central to our methodology, is a learning to learn and optimize approach, referred to as HUmanNeural Descent (HUND), which avoids both second-order differentiation.
arXiv Detail & Related papers (2020-08-16T13:38:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.