Estimating the distribution of numerosity and non-numerical visual magnitudes in natural scenes using computer vision
- URL: http://arxiv.org/abs/2409.11028v2
- Date: Tue, 15 Oct 2024 15:21:14 GMT
- Title: Estimating the distribution of numerosity and non-numerical visual magnitudes in natural scenes using computer vision
- Authors: Kuinan Hou, Marco Zorzi, Alberto Testolin,
- Abstract summary: We show that in natural visual scenes the frequency of appearance of different numerosities follows a power law distribution.
We show that the correlational structure for numerosity and continuous magnitudes is stable across datasets and scene types.
- Score: 0.08192907805418582
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Humans share with many animal species the ability to perceive and approximately represent the number of objects in visual scenes. This ability improves throughout childhood, suggesting that learning and development play a key role in shaping our number sense. This hypothesis is further supported by computational investigations based on deep learning, which have shown that numerosity perception can spontaneously emerge in neural networks that learn the statistical structure of images with a varying number of items. However, neural network models are usually trained using synthetic datasets that might not faithfully reflect the statistical structure of natural environments, and there is also growing interest in using more ecological visual stimuli to investigate numerosity perception in humans. In this work, we exploit recent advances in computer vision algorithms to design and implement an original pipeline that can be used to estimate the distribution of numerosity and non-numerical magnitudes in large-scale datasets containing thousands of real images depicting objects in daily life situations. We show that in natural visual scenes the frequency of appearance of different numerosities follows a power law distribution. Moreover, we show that the correlational structure for numerosity and continuous magnitudes is stable across datasets and scene types (homogeneous vs. heterogeneous object sets). We suggest that considering such "ecological" pattern of covariance is important to understand the influence of non-numerical visual cues on numerosity judgements.
Related papers
- When Does Perceptual Alignment Benefit Vision Representations? [76.32336818860965]
We investigate how aligning vision model representations to human perceptual judgments impacts their usability.
We find that aligning models to perceptual judgments yields representations that improve upon the original backbones across many downstream tasks.
Our results suggest that injecting an inductive bias about human perceptual knowledge into vision models can contribute to better representations.
arXiv Detail & Related papers (2024-10-14T17:59:58Z) - Connecting metrics for shape-texture knowledge in computer vision [1.7785095623975342]
Deep neural networks remain brittle and susceptible to many changes in the image that do not cause humans to misclassify images.
Part of this different behavior may be explained by the type of features humans and deep neural networks use in vision tasks.
arXiv Detail & Related papers (2023-01-25T14:37:42Z) - EllSeg-Gen, towards Domain Generalization for head-mounted eyetracking [19.913297057204357]
We show that convolutional networks excel at extracting gaze features despite the presence of such artifacts.
We compare the performance of a single model trained with multiple datasets against a pool of models trained on individual datasets.
Results indicate that models tested on datasets in which eye images exhibit higher appearance variability benefit from multiset training.
arXiv Detail & Related papers (2022-05-04T08:35:52Z) - Stochastic Coherence Over Attention Trajectory For Continuous Learning
In Video Streams [64.82800502603138]
This paper proposes a novel neural-network-based approach to progressively and autonomously develop pixel-wise representations in a video stream.
The proposed method is based on a human-like attention mechanism that allows the agent to learn by observing what is moving in the attended locations.
Our experiments leverage 3D virtual environments and they show that the proposed agents can learn to distinguish objects just by observing the video stream.
arXiv Detail & Related papers (2022-04-26T09:52:31Z) - Data-driven emergence of convolutional structure in neural networks [83.4920717252233]
We show how fully-connected neural networks solving a discrimination task can learn a convolutional structure directly from their inputs.
By carefully designing data models, we show that the emergence of this pattern is triggered by the non-Gaussian, higher-order local structure of the inputs.
arXiv Detail & Related papers (2022-02-01T17:11:13Z) - Probabilistic Tracking with Deep Factors [8.030212474745879]
We show how to use a deep feature encoding in conjunction with generative densities over the features in a factor-graph based, probabilistic tracking framework.
We present a likelihood model that combines a learned feature encoder with generative densities over them, both trained in a supervised manner.
arXiv Detail & Related papers (2021-12-02T21:31:51Z) - Visual Distant Supervision for Scene Graph Generation [66.10579690929623]
Scene graph models usually require supervised learning on large quantities of labeled data with intensive human annotation.
We propose visual distant supervision, a novel paradigm of visual relation learning, which can train scene graph models without any human-labeled data.
Comprehensive experimental results show that our distantly supervised model outperforms strong weakly supervised and semi-supervised baselines.
arXiv Detail & Related papers (2021-03-29T06:35:24Z) - A Number Sense as an Emergent Property of the Manipulating Brain [16.186932790845937]
We study the mechanism through which humans acquire and develop the ability to manipulate numbers and quantities.
Our model acquires the ability to estimate numerosity, i.e. the number of objects in the scene.
We conclude that important aspects of a facility with numbers and quantities may be learned with supervision from a simple pre-training task.
arXiv Detail & Related papers (2020-12-08T00:37:35Z) - What Can You Learn from Your Muscles? Learning Visual Representation
from Human Interactions [50.435861435121915]
We use human interaction and attention cues to investigate whether we can learn better representations compared to visual-only representations.
Our experiments show that our "muscly-supervised" representation outperforms a visual-only state-of-the-art method MoCo.
arXiv Detail & Related papers (2020-10-16T17:46:53Z) - Seeing eye-to-eye? A comparison of object recognition performance in
humans and deep convolutional neural networks under image manipulation [0.0]
This study aims towards a behavioral comparison of visual core object recognition performance between humans and feedforward neural networks.
Analyses of accuracy revealed that humans not only outperform DCNNs on all conditions, but also display significantly greater robustness towards shape and most notably color alterations.
arXiv Detail & Related papers (2020-07-13T10:26:30Z) - Causal Discovery in Physical Systems from Videos [123.79211190669821]
Causal discovery is at the core of human cognition.
We consider the task of causal discovery from videos in an end-to-end fashion without supervision on the ground-truth graph structure.
arXiv Detail & Related papers (2020-07-01T17:29:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.