MIMo: A Multi-Modal Infant Model for Studying Cognitive Development
- URL: http://arxiv.org/abs/2312.04318v1
- Date: Thu, 7 Dec 2023 14:21:31 GMT
- Title: MIMo: A Multi-Modal Infant Model for Studying Cognitive Development
- Authors: Dominik Mattern, Pierre Schumacher, Francisco M. L\'opez, Marcel C.
Raabe, Markus R. Ernst, Arthur Aubret, Jochen Triesch
- Abstract summary: We present MIMo, an open-source infant model for studying early cognitive development through computer simulations.
MIMo perceives its surroundings via binocular vision, a vestibular system, proprioception, and touch perception through a full-body virtual skin.
- Score: 3.5009119465343033
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human intelligence and human consciousness emerge gradually during the
process of cognitive development. Understanding this development is an
essential aspect of understanding the human mind and may facilitate the
construction of artificial minds with similar properties. Importantly, human
cognitive development relies on embodied interactions with the physical and
social environment, which is perceived via complementary sensory modalities.
These interactions allow the developing mind to probe the causal structure of
the world. This is in stark contrast to common machine learning approaches,
e.g., for large language models, which are merely passively ``digesting'' large
amounts of training data, but are not in control of their sensory inputs.
However, computational modeling of the kind of self-determined embodied
interactions that lead to human intelligence and consciousness is a formidable
challenge. Here we present MIMo, an open-source multi-modal infant model for
studying early cognitive development through computer simulations. MIMo's body
is modeled after an 18-month-old child with detailed five-fingered hands. MIMo
perceives its surroundings via binocular vision, a vestibular system,
proprioception, and touch perception through a full-body virtual skin, while
two different actuation models allow control of his body. We describe the
design and interfaces of MIMo and provide examples illustrating its use. All
code is available at https://github.com/trieschlab/MIMo .
Related papers
- Towards Interpretable Visuo-Tactile Predictive Models for Soft Robot Interactions [2.4100803794273]
Successful integration of robotic agents into real-world situations hinges on their perception capabilities.
We build upon the fusion of various sensory modalities to probe the surroundings.
Deep learning applied to raw sensory modalities offers a viable option.
We will delve into the outlooks of the perception model and its implications for control purposes.
arXiv Detail & Related papers (2024-07-16T21:46:04Z) - Visual cognition in multimodal large language models [12.603212933816206]
Recent advancements have rekindled interest in the potential to emulate human-like cognitive abilities.
This paper evaluates the current state of vision-based large language models in the domains of intuitive physics, causal reasoning, and intuitive psychology.
arXiv Detail & Related papers (2023-11-27T18:58:34Z) - World Models and Predictive Coding for Cognitive and Developmental
Robotics: Frontiers and Challenges [51.92834011423463]
We focus on the two concepts of world models and predictive coding.
In neuroscience, predictive coding proposes that the brain continuously predicts its inputs and adapts to model its own dynamics and control behavior in its environment.
arXiv Detail & Related papers (2023-01-14T06:38:14Z) - Learning body models: from humans to humanoids [2.855485723554975]
Humans and animals excel in combining information from multiple sensory modalities, controlling their complex bodies, adapting to growth, failures, or using tools.
Key foundation is an internal representation of the body that the agent - human, animal, or robot - has developed.
mechanisms of operation of body models in the brain are largely unknown and even less is known about how they are constructed from experience after birth.
arXiv Detail & Related papers (2022-11-06T07:30:01Z) - WenLan 2.0: Make AI Imagine via a Multimodal Foundation Model [74.4875156387271]
We develop a novel foundation model pre-trained with huge multimodal (visual and textual) data.
We show that state-of-the-art results can be obtained on a wide range of downstream tasks.
arXiv Detail & Related papers (2021-10-27T12:25:21Z) - From internal models toward metacognitive AI [0.0]
In the prefrontal cortex, a distributed executive network called the "cognitive reality monitoring network" orchestrates conscious involvement of generative-inverse model pairs.
A high responsibility signal is given to the pairs that best capture the external world.
consciousness is determined by the entropy of responsibility signals across all pairs.
arXiv Detail & Related papers (2021-09-27T05:00:56Z) - Cognitive architecture aided by working-memory for self-supervised
multi-modal humans recognition [54.749127627191655]
The ability to recognize human partners is an important social skill to build personalized and long-term human-robot interactions.
Deep learning networks have achieved state-of-the-art results and demonstrated to be suitable tools to address such a task.
One solution is to make robots learn from their first-hand sensory data with self-supervision.
arXiv Detail & Related papers (2021-03-16T13:50:24Z) - AGENT: A Benchmark for Core Psychological Reasoning [60.35621718321559]
Intuitive psychology is the ability to reason about hidden mental variables that drive observable actions.
Despite recent interest in machine agents that reason about other agents, it is not clear if such agents learn or hold the core psychology principles that drive human reasoning.
We present a benchmark consisting of procedurally generated 3D animations, AGENT, structured around four scenarios.
arXiv Detail & Related papers (2021-02-24T14:58:23Z) - Crossmodal Language Grounding in an Embodied Neurocognitive Model [28.461246169379685]
Human infants are able to acquire natural language seemingly easily at an early age.
From a neuroscientific perspective, natural language is embodied, grounded in most, if not all, sensory and sensorimotor modalities.
We present a neurocognitive model for language grounding which reflects bio-inspired mechanisms.
arXiv Detail & Related papers (2020-06-24T08:12:09Z) - Machine Common Sense [77.34726150561087]
Machine common sense remains a broad, potentially unbounded problem in artificial intelligence (AI)
This article deals with the aspects of modeling commonsense reasoning focusing on such domain as interpersonal interactions.
arXiv Detail & Related papers (2020-06-15T13:59:47Z) - A Developmental Neuro-Robotics Approach for Boosting the Recognition of
Handwritten Digits [91.3755431537592]
Recent evidence shows that a simulation of the children's embodied strategies can improve the machine intelligence too.
This article explores the application of embodied strategies to convolutional neural network models in the context of developmental neuro-robotics.
arXiv Detail & Related papers (2020-03-23T14:55:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.