A Number Sense as an Emergent Property of the Manipulating Brain
- URL: http://arxiv.org/abs/2012.04132v4
- Date: Sun, 24 Mar 2024 01:23:11 GMT
- Title: A Number Sense as an Emergent Property of the Manipulating Brain
- Authors: Neehar Kondapaneni, Pietro Perona,
- Abstract summary: We study the mechanism through which humans acquire and develop the ability to manipulate numbers and quantities.
Our model acquires the ability to estimate numerosity, i.e. the number of objects in the scene.
We conclude that important aspects of a facility with numbers and quantities may be learned with supervision from a simple pre-training task.
- Score: 16.186932790845937
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The ability to understand and manipulate numbers and quantities emerges during childhood, but the mechanism through which humans acquire and develop this ability is still poorly understood. We explore this question through a model, assuming that the learner is able to pick up and place small objects from, and to, locations of its choosing, and will spontaneously engage in such undirected manipulation. We further assume that the learner's visual system will monitor the changing arrangements of objects in the scene and will learn to predict the effects of each action by comparing perception with a supervisory signal from the motor system. We model perception using standard deep networks for feature extraction and classification, and gradient descent learning. Our main finding is that, from learning the task of action prediction, an unexpected image representation emerges exhibiting regularities that foreshadow the perception and representation of numbers and quantity. These include distinct categories for zero and the first few natural numbers, a strict ordering of the numbers, and a one-dimensional signal that correlates with numerical quantity. As a result, our model acquires the ability to estimate numerosity, i.e. the number of objects in the scene, as well as subitization, i.e. the ability to recognize at a glance the exact number of objects in small scenes. Remarkably, subitization and numerosity estimation extrapolate to scenes containing many objects, far beyond the three objects used during training. We conclude that important aspects of a facility with numbers and quantities may be learned with supervision from a simple pre-training task. Our observations suggest that cross-modal learning is a powerful learning mechanism that may be harnessed in artificial intelligence.
Related papers
- Estimating the distribution of numerosity and non-numerical visual magnitudes in natural scenes using computer vision [0.08192907805418582]
We show that in natural visual scenes the frequency of appearance of different numerosities follows a power law distribution.
We show that the correlational structure for numerosity and continuous magnitudes is stable across datasets and scene types.
arXiv Detail & Related papers (2024-09-17T09:49:29Z) - Visual Enumeration is Challenging for Large-scale Generative AI [0.08192907805418582]
Humans can readily judge the number of objects in a visual scene, even without counting.
We investigate whether large-scale generative Artificial Intelligence (AI) systems have a human-like number sense.
arXiv Detail & Related papers (2024-01-09T18:18:32Z) - Neural feels with neural fields: Visuo-tactile perception for in-hand
manipulation [57.60490773016364]
We combine vision and touch sensing on a multi-fingered hand to estimate an object's pose and shape during in-hand manipulation.
Our method, NeuralFeels, encodes object geometry by learning a neural field online and jointly tracks it by optimizing a pose graph problem.
Our results demonstrate that touch, at the very least, refines and, at the very best, disambiguates visual estimates during in-hand manipulation.
arXiv Detail & Related papers (2023-12-20T22:36:37Z) - What Makes Pre-Trained Visual Representations Successful for Robust
Manipulation? [57.92924256181857]
We find that visual representations designed for manipulation and control tasks do not necessarily generalize under subtle changes in lighting and scene texture.
We find that emergent segmentation ability is a strong predictor of out-of-distribution generalization among ViT models.
arXiv Detail & Related papers (2023-11-03T18:09:08Z) - Stochastic Coherence Over Attention Trajectory For Continuous Learning
In Video Streams [64.82800502603138]
This paper proposes a novel neural-network-based approach to progressively and autonomously develop pixel-wise representations in a video stream.
The proposed method is based on a human-like attention mechanism that allows the agent to learn by observing what is moving in the attended locations.
Our experiments leverage 3D virtual environments and they show that the proposed agents can learn to distinguish objects just by observing the video stream.
arXiv Detail & Related papers (2022-04-26T09:52:31Z) - Understanding top-down attention using task-oriented ablation design [0.22940141855172028]
Top-down attention allows neural networks, both artificial and biological, to focus on the information most relevant for a given task.
We aim to answer this with a computational experiment based on a general framework called task-oriented ablation design.
We compare the performance of two neural networks, one with top-down attention and one without.
arXiv Detail & Related papers (2021-06-08T21:01:47Z) - Object Based Attention Through Internal Gating [4.941630596191806]
We propose an artificial neural network model of object-based attention.
Our model captures the way in which attention is both top-down and recurrent.
We find that our model replicates a range of findings from neuroscience.
arXiv Detail & Related papers (2021-06-08T17:20:50Z) - Self-Supervision by Prediction for Object Discovery in Videos [62.87145010885044]
In this paper, we use the prediction task as self-supervision and build a novel object-centric model for image sequence representation.
Our framework can be trained without the help of any manual annotation or pretrained network.
Initial experiments confirm that the proposed pipeline is a promising step towards object-centric video prediction.
arXiv Detail & Related papers (2021-03-09T19:14:33Z) - What Can You Learn from Your Muscles? Learning Visual Representation
from Human Interactions [50.435861435121915]
We use human interaction and attention cues to investigate whether we can learn better representations compared to visual-only representations.
Our experiments show that our "muscly-supervised" representation outperforms a visual-only state-of-the-art method MoCo.
arXiv Detail & Related papers (2020-10-16T17:46:53Z) - A robot that counts like a child: a developmental model of counting and
pointing [69.26619423111092]
A novel neuro-robotics model capable of counting real items is introduced.
The model allows us to investigate the interaction between embodiment and numerical cognition.
The trained model is able to count a set of items and at the same time points to them.
arXiv Detail & Related papers (2020-08-05T21:06:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.