Neural Foundations of Mental Simulation: Future Prediction of Latent
Representations on Dynamic Scenes
- URL: http://arxiv.org/abs/2305.11772v2
- Date: Wed, 25 Oct 2023 15:34:16 GMT
- Title: Neural Foundations of Mental Simulation: Future Prediction of Latent
Representations on Dynamic Scenes
- Authors: Aran Nayebi, Rishi Rajalingham, Mehrdad Jazayeri, Guangyu Robert Yang
- Abstract summary: We combine a goal-driven modeling approach with dense neurophysiological data and human behavioral readouts to impinge on this question.
Specifically, we construct and evaluate several classes of sensory-cognitive networks to predict the future state of rich, ethologically-relevant environments.
We find strong differentiation across these model classes in their ability to predict neural and behavioral data both within and across diverse environments.
- Score: 3.2744507958793143
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Humans and animals have a rich and flexible understanding of the physical
world, which enables them to infer the underlying dynamical trajectories of
objects and events, plausible future states, and use that to plan and
anticipate the consequences of actions. However, the neural mechanisms
underlying these computations are unclear. We combine a goal-driven modeling
approach with dense neurophysiological data and high-throughput human
behavioral readouts to directly impinge on this question. Specifically, we
construct and evaluate several classes of sensory-cognitive networks to predict
the future state of rich, ethologically-relevant environments, ranging from
self-supervised end-to-end models with pixel-wise or object-centric objectives,
to models that future predict in the latent space of purely static image-based
or dynamic video-based pretrained foundation models. We find strong
differentiation across these model classes in their ability to predict neural
and behavioral data both within and across diverse environments. In particular,
we find that neural responses are currently best predicted by models trained to
predict the future state of their environment in the latent space of pretrained
foundation models optimized for dynamic scenes in a self-supervised manner.
Notably, models that future predict in the latent space of video foundation
models that are optimized to support a diverse range of sensorimotor tasks,
reasonably match both human behavioral error patterns and neural dynamics
across all environmental scenarios that we were able to test. Overall, these
findings suggest that the neural mechanisms and behaviors of primate mental
simulation are thus far most consistent with being optimized to future predict
on dynamic, reusable visual representations that are useful for Embodied AI
more generally.
Related papers
- Neural Dynamics Model of Visual Decision-Making: Learning from Human Experts [28.340344705437758]
We implement a comprehensive visual decision-making model that spans from visual input to behavioral output.
Our model aligns closely with human behavior and reflects neural activities in primates.
A neuroimaging-informed fine-tuning approach was introduced and applied to the model, leading to performance improvements.
arXiv Detail & Related papers (2024-09-04T02:38:52Z) - GaussianPrediction: Dynamic 3D Gaussian Prediction for Motion Extrapolation and Free View Synthesis [71.24791230358065]
We introduce a novel framework that empowers 3D Gaussian representations with dynamic scene modeling and future scenario synthesis.
GaussianPrediction can forecast future states from any viewpoint, using video observations of dynamic scenes.
Our framework shows outstanding performance on both synthetic and real-world datasets, demonstrating its efficacy in predicting and rendering future environments.
arXiv Detail & Related papers (2024-05-30T06:47:55Z) - Learning Physical Dynamics for Object-centric Visual Prediction [7.395357888610685]
The ability to model the underlying dynamics of visual scenes and reason about the future is central to human intelligence.
This paper proposes an unsupervised object-centric prediction model that makes future predictions by learning visual dynamics between objects.
arXiv Detail & Related papers (2024-03-15T07:45:25Z) - ContPhy: Continuum Physical Concept Learning and Reasoning from Videos [86.63174804149216]
ContPhy is a novel benchmark for assessing machine physical commonsense.
We evaluated a range of AI models and found that they still struggle to achieve satisfactory performance on ContPhy.
We also introduce an oracle model (ContPRO) that marries the particle-based physical dynamic models with the recent large language models.
arXiv Detail & Related papers (2024-02-09T01:09:21Z) - A Neuro-Symbolic Approach for Enhanced Human Motion Prediction [5.742409080817885]
We propose a neuro-symbolic approach for human motion prediction (NeuroSyM)
NeuroSyM weights differently the interactions in the neighbourhood by leveraging an intuitive technique for spatial representation called qualitative Trajectory Calculus (QTC)
Experimental results show that the NeuroSyM approach outperforms in most cases the baseline architectures in terms of prediction accuracy.
arXiv Detail & Related papers (2023-04-23T20:11:40Z) - Predictive Experience Replay for Continual Visual Control and
Forecasting [62.06183102362871]
We present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting.
We first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting.
Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks.
arXiv Detail & Related papers (2023-03-12T05:08:03Z) - Learn to Predict How Humans Manipulate Large-sized Objects from
Interactive Motions [82.90906153293585]
We propose a graph neural network, HO-GCN, to fuse motion data and dynamic descriptors for the prediction task.
We show the proposed network that consumes dynamic descriptors can achieve state-of-the-art prediction results and help the network better generalize to unseen objects.
arXiv Detail & Related papers (2022-06-25T09:55:39Z) - Physion: Evaluating Physical Prediction from Vision in Humans and
Machines [46.19008633309041]
We present a visual and physical prediction benchmark that precisely measures this capability.
We compare an array of algorithms on their ability to make diverse physical predictions.
We find that graph neural networks with access to the physical state best capture human behavior.
arXiv Detail & Related papers (2021-06-15T16:13:39Z) - The Neural Coding Framework for Learning Generative Models [91.0357317238509]
We propose a novel neural generative model inspired by the theory of predictive processing in the brain.
In a similar way, artificial neurons in our generative model predict what neighboring neurons will do, and adjust their parameters based on how well the predictions matched reality.
arXiv Detail & Related papers (2020-12-07T01:20:38Z) - Visual Grounding of Learned Physical Models [66.04898704928517]
Humans intuitively recognize objects' physical properties and predict their motion, even when the objects are engaged in complicated interactions.
We present a neural model that simultaneously reasons about physics and makes future predictions based on visual and dynamics priors.
Experiments show that our model can infer the physical properties within a few observations, which allows the model to quickly adapt to unseen scenarios and make accurate predictions into the future.
arXiv Detail & Related papers (2020-04-28T17:06:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.