MIMo: A Multi-Modal Infant Model for Studying Cognitive Development
        - URL: http://arxiv.org/abs/2312.04318v1
- Date: Thu, 7 Dec 2023 14:21:31 GMT
- Title: MIMo: A Multi-Modal Infant Model for Studying Cognitive Development
- Authors: Dominik Mattern, Pierre Schumacher, Francisco M. L\'opez, Marcel C.
  Raabe, Markus R. Ernst, Arthur Aubret, Jochen Triesch
- Abstract summary: We present MIMo, an open-source infant model for studying early cognitive development through computer simulations.
MIMo perceives its surroundings via binocular vision, a vestibular system, proprioception, and touch perception through a full-body virtual skin.
- Score: 3.5009119465343033
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   Human intelligence and human consciousness emerge gradually during the
process of cognitive development. Understanding this development is an
essential aspect of understanding the human mind and may facilitate the
construction of artificial minds with similar properties. Importantly, human
cognitive development relies on embodied interactions with the physical and
social environment, which is perceived via complementary sensory modalities.
These interactions allow the developing mind to probe the causal structure of
the world. This is in stark contrast to common machine learning approaches,
e.g., for large language models, which are merely passively ``digesting'' large
amounts of training data, but are not in control of their sensory inputs.
However, computational modeling of the kind of self-determined embodied
interactions that lead to human intelligence and consciousness is a formidable
challenge. Here we present MIMo, an open-source multi-modal infant model for
studying early cognitive development through computer simulations. MIMo's body
is modeled after an 18-month-old child with detailed five-fingered hands. MIMo
perceives its surroundings via binocular vision, a vestibular system,
proprioception, and touch perception through a full-body virtual skin, while
two different actuation models allow control of his body. We describe the
design and interfaces of MIMo and provide examples illustrating its use. All
code is available at https://github.com/trieschlab/MIMo .
 
      
        Related papers
        - Embodied AI Agents: Modeling the World [188.85697524284834]
 This paper describes our research on AI agents embodied in visual, virtual or physical forms.<n>We propose that the development of world models is central to reasoning and planning of embodied AI agents.<n>We also propose to learn the mental world model of users to enable better human-agent collaboration.
 arXiv  Detail & Related papers  (2025-06-27T16:05:34Z)
- Emergent Active Perception and Dexterity of Simulated Humanoids from   Visual Reinforcement Learning [69.71072181304066]
 We introduce Perceptive Dexterous Control (PDC), a framework for vision-driven whole-body control with simulated humanoids.<n>PDC operates solely on egocentric vision for task specification, enabling object search, target placement, and skill selection through visual cues.<n>We show that training from scratch with reinforcement learning can produce emergent behaviors such as active search.
 arXiv  Detail & Related papers  (2025-05-18T07:33:31Z)
- Neural Brain: A Neuroscience-inspired Framework for Embodied Agents [58.58177409853298]
 Current AI systems, such as large language models, remain disembodied, unable to physically engage with the world.<n>At the core of this challenge lies the concept of Neural Brain, a central intelligence system designed to drive embodied agents with human-like adaptability.<n>This paper introduces a unified framework for the Neural Brain of embodied agents, addressing two fundamental challenges.
 arXiv  Detail & Related papers  (2025-05-12T15:05:34Z)
- Discovering Hidden Visual Concepts Beyond Linguistic Input in Infant   Learning [18.43931715859825]
 As computer vision seeks to replicate the human vision system, understanding infant visual development may offer valuable insights.
In this paper, we present an interdisciplinary study exploring this question.
Can a computational model that imitates the infant learning process develop broader visual concepts similar to how infants naturally learn?
Our work bridges cognitive science and computer vision by analyzing the internal representations of a computational model trained on an infant visual and linguistic inputs.
 arXiv  Detail & Related papers  (2025-01-09T12:55:55Z)
- Towards Interpretable Visuo-Tactile Predictive Models for Soft Robot   Interactions [2.4100803794273]
 Successful integration of robotic agents into real-world situations hinges on their perception capabilities.
We build upon the fusion of various sensory modalities to probe the surroundings.
Deep learning applied to raw sensory modalities offers a viable option.
We will delve into the outlooks of the perception model and its implications for control purposes.
 arXiv  Detail & Related papers  (2024-07-16T21:46:04Z)
- Visual cognition in multimodal large language models [12.603212933816206]
 Recent advancements have rekindled interest in the potential to emulate human-like cognitive abilities.
This paper evaluates the current state of vision-based large language models in the domains of intuitive physics, causal reasoning, and intuitive psychology.
 arXiv  Detail & Related papers  (2023-11-27T18:58:34Z)
- World Models and Predictive Coding for Cognitive and Developmental
  Robotics: Frontiers and Challenges [51.92834011423463]
 We focus on the two concepts of world models and predictive coding.
In neuroscience, predictive coding proposes that the brain continuously predicts its inputs and adapts to model its own dynamics and control behavior in its environment.
 arXiv  Detail & Related papers  (2023-01-14T06:38:14Z)
- Learning body models: from humans to humanoids [2.855485723554975]
 Humans and animals excel in combining information from multiple sensory modalities, controlling their complex bodies, adapting to growth, failures, or using tools.
Key foundation is an internal representation of the body that the agent - human, animal, or robot - has developed.
 mechanisms of operation of body models in the brain are largely unknown and even less is known about how they are constructed from experience after birth.
 arXiv  Detail & Related papers  (2022-11-06T07:30:01Z)
- WenLan 2.0: Make AI Imagine via a Multimodal Foundation Model [74.4875156387271]
 We develop a novel foundation model pre-trained with huge multimodal (visual and textual) data.
We show that state-of-the-art results can be obtained on a wide range of downstream tasks.
 arXiv  Detail & Related papers  (2021-10-27T12:25:21Z)
- From internal models toward metacognitive AI [0.0]
 In the prefrontal cortex, a distributed executive network called the "cognitive reality monitoring network" orchestrates conscious involvement of generative-inverse model pairs.
A high responsibility signal is given to the pairs that best capture the external world.
 consciousness is determined by the entropy of responsibility signals across all pairs.
 arXiv  Detail & Related papers  (2021-09-27T05:00:56Z)
- Cognitive architecture aided by working-memory for self-supervised
  multi-modal humans recognition [54.749127627191655]
 The ability to recognize human partners is an important social skill to build personalized and long-term human-robot interactions.
Deep learning networks have achieved state-of-the-art results and demonstrated to be suitable tools to address such a task.
One solution is to make robots learn from their first-hand sensory data with self-supervision.
 arXiv  Detail & Related papers  (2021-03-16T13:50:24Z)
- AGENT: A Benchmark for Core Psychological Reasoning [60.35621718321559]
 Intuitive psychology is the ability to reason about hidden mental variables that drive observable actions.
Despite recent interest in machine agents that reason about other agents, it is not clear if such agents learn or hold the core psychology principles that drive human reasoning.
We present a benchmark consisting of procedurally generated 3D animations, AGENT, structured around four scenarios.
 arXiv  Detail & Related papers  (2021-02-24T14:58:23Z)
- Crossmodal Language Grounding in an Embodied Neurocognitive Model [28.461246169379685]
 Human infants are able to acquire natural language seemingly easily at an early age.
From a neuroscientific perspective, natural language is embodied, grounded in most, if not all, sensory and sensorimotor modalities.
We present a neurocognitive model for language grounding which reflects bio-inspired mechanisms.
 arXiv  Detail & Related papers  (2020-06-24T08:12:09Z)
- Machine Common Sense [77.34726150561087]
 Machine common sense remains a broad, potentially unbounded problem in artificial intelligence (AI)
This article deals with the aspects of modeling commonsense reasoning focusing on such domain as interpersonal interactions.
 arXiv  Detail & Related papers  (2020-06-15T13:59:47Z)
- A Developmental Neuro-Robotics Approach for Boosting the Recognition of
  Handwritten Digits [91.3755431537592]
 Recent evidence shows that a simulation of the children's embodied strategies can improve the machine intelligence too.
This article explores the application of embodied strategies to convolutional neural network models in the context of developmental neuro-robotics.
 arXiv  Detail & Related papers  (2020-03-23T14:55:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.