A Deep Learning Model of Mental Rotation Informed by Interactive VR Experiments
- URL: http://arxiv.org/abs/2512.13517v1
- Date: Mon, 15 Dec 2025 16:43:50 GMT
- Title: A Deep Learning Model of Mental Rotation Informed by Interactive VR Experiments
- Authors: Raymond Khazoum, Daniela Fernandes, Aleksandr Krylov, Qin Li, Stephane Deny,
- Abstract summary: Mental rotation is a fundamental example of mental simulation and spatial world modelling in humans.<n>We propose a mechanistic model of human mental rotation, leveraging advances in deep, equivariant, and neuro-symbolic learning.<n>Our model captures well the performance, response times and behavior of participants in our and others' experiments.
- Score: 36.264146096690375
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mental rotation -- the ability to compare objects seen from different viewpoints -- is a fundamental example of mental simulation and spatial world modelling in humans. Here we propose a mechanistic model of human mental rotation, leveraging advances in deep, equivariant, and neuro-symbolic learning. Our model consists of three stacked components: (1) an equivariant neural encoder, taking images as input and producing 3D spatial representations of objects, (2) a neuro-symbolic object encoder, deriving symbolic descriptions of objects from these spatial representations, and (3) a neural decision agent, comparing these symbolic descriptions to prescribe rotation simulations in 3D latent space via a recurrent pathway. Our model design is guided by the abundant experimental literature on mental rotation, which we complemented with experiments in VR where participants could at times manipulate the objects to compare, providing us with additional insights into the cognitive process of mental rotation. Our model captures well the performance, response times and behavior of participants in our and others' experiments. The necessity of each model component is shown through systematic ablations. Our work adds to a recent collection of deep neural models of human spatial reasoning, further demonstrating the potency of integrating deep, equivariant, and symbolic representations to model the human mind.
Related papers
- Human-level 3D shape perception emerges from multi-view learning [63.048728487674815]
We develop a modeling framework that predicts human 3D shape inferences for arbitrary objects.<n>We achieve this with a novel class of neural networks trained using a visual-spatial objective over naturalistic sensory data.<n>We find that human-level 3D perception can emerge from a simple, scalable learning objective over naturalistic visual-spatial data.
arXiv Detail & Related papers (2026-02-19T18:56:05Z) - Brain3D: Generating 3D Objects from fMRI [78.46936519561298]
We design a novel 3D object representation learning method, Brain3D, that takes as input the fMRI data of a subject.<n>We show that our model captures the distinct functionalities of each region of human vision system.<n>Preliminary evaluations indicate that Brain3D can successfully identify the disordered brain regions in simulated scenarios.
arXiv Detail & Related papers (2024-05-24T06:06:11Z) - Learning Internal Representations of 3D Transformations from 2D
Projected Inputs [13.029330360766595]
We show how our model infers depth from moving 2D projected points, learns 3D rotational transformations from 2D training stimuli, and compares to human performance on psychophysical structure-from-motion experiments.
arXiv Detail & Related papers (2023-03-31T02:43:01Z) - Neural Novel Actor: Learning a Generalized Animatable Neural
Representation for Human Actors [98.24047528960406]
We propose a new method for learning a generalized animatable neural representation from a sparse set of multi-view imagery of multiple persons.
The learned representation can be used to synthesize novel view images of an arbitrary person from a sparse set of cameras, and further animate them with the user's pose control.
arXiv Detail & Related papers (2022-08-25T07:36:46Z) - Overcoming the Domain Gap in Neural Action Representations [60.47807856873544]
3D pose data can now be reliably extracted from multi-view video sequences without manual intervention.
We propose to use it to guide the encoding of neural action representations together with a set of neural and behavioral augmentations.
To reduce the domain gap, during training, we swap neural and behavioral data across animals that seem to be performing similar actions.
arXiv Detail & Related papers (2021-12-02T12:45:46Z) - 3D Neural Scene Representations for Visuomotor Control [78.79583457239836]
We learn models for dynamic 3D scenes purely from 2D visual observations.
A dynamics model, constructed over the learned representation space, enables visuomotor control for challenging manipulation tasks.
arXiv Detail & Related papers (2021-07-08T17:49:37Z) - Neural Actor: Neural Free-view Synthesis of Human Actors with Pose
Control [80.79820002330457]
We propose a new method for high-quality synthesis of humans from arbitrary viewpoints and under arbitrary controllable poses.
Our method achieves better quality than the state-of-the-arts on playback as well as novel pose synthesis, and can even generalize well to new poses that starkly differ from the training poses.
arXiv Detail & Related papers (2021-06-03T17:40:48Z) - Exploring The Spatial Reasoning Ability of Neural Models in Human IQ
Tests [19.338539583910023]
We focus on spatial reasoning and explore the spatial understanding of neural models.
We constructed datasets that consist of various complexity levels.
We provide an analysis of the results and factors that affect the generalization abilities of models.
arXiv Detail & Related papers (2020-04-11T09:41:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.