Human alignment of neural network representations
- URL: http://arxiv.org/abs/2211.01201v4
- Date: Mon, 3 Apr 2023 09:02:13 GMT
- Title: Human alignment of neural network representations
- Authors: Lukas Muttenthaler, Jonas Dippel, Lorenz Linhardt, Robert A.
Vandermeulen, Simon Kornblith
- Abstract summary: We investigate the factors that affect the alignment between the representations learned by neural networks and human mental representations inferred from behavioral responses.
We find that model scale and architecture have essentially no effect on the alignment with human behavioral responses.
We find that some human concepts such as food and animals are well-represented by neural networks whereas others such as royal or sports-related objects are not.
- Score: 22.671101285994013
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Today's computer vision models achieve human or near-human level performance
across a wide variety of vision tasks. However, their architectures, data, and
learning algorithms differ in numerous ways from those that give rise to human
vision. In this paper, we investigate the factors that affect the alignment
between the representations learned by neural networks and human mental
representations inferred from behavioral responses. We find that model scale
and architecture have essentially no effect on the alignment with human
behavioral responses, whereas the training dataset and objective function both
have a much larger impact. These findings are consistent across three datasets
of human similarity judgments collected using two different tasks. Linear
transformations of neural network representations learned from behavioral
responses from one dataset substantially improve alignment with human
similarity judgments on the other two datasets. In addition, we find that some
human concepts such as food and animals are well-represented by neural networks
whereas others such as royal or sports-related objects are not. Overall,
although models trained on larger, more diverse datasets achieve better
alignment with humans than models trained on ImageNet alone, our results
indicate that scaling alone is unlikely to be sufficient to train neural
networks with conceptual representations that match those used by humans.
Related papers
- Aligning Machine and Human Visual Representations across Abstraction Levels [42.86478924838503]
Deep neural networks have achieved success across a wide range of applications, including as models of human behavior in vision tasks.
However, neural network training and human learning differ in fundamental ways, and neural networks often fail to generalize as robustly as humans do.
We highlight a key misalignment between vision models and humans: whereas human conceptual knowledge is hierarchically organized from fine- to coarse-scale distinctions, model representations do not accurately capture all these levels of abstraction.
To address this misalignment, we first train a teacher model to imitate human judgments, then transfer human-like structure from its representations into pretrained state-of-the
arXiv Detail & Related papers (2024-09-10T13:41:08Z) - Evaluating Multiview Object Consistency in Humans and Image Models [68.36073530804296]
We leverage an experimental design from the cognitive sciences which requires zero-shot visual inferences about object shape.
We collect 35K trials of behavioral data from over 500 participants.
We then evaluate the performance of common vision models.
arXiv Detail & Related papers (2024-09-09T17:59:13Z) - Graph Neural Networks for Learning Equivariant Representations of Neural Networks [55.04145324152541]
We propose to represent neural networks as computational graphs of parameters.
Our approach enables a single model to encode neural computational graphs with diverse architectures.
We showcase the effectiveness of our method on a wide range of tasks, including classification and editing of implicit neural representations.
arXiv Detail & Related papers (2024-03-18T18:01:01Z) - Evaluating alignment between humans and neural network representations in image-based learning tasks [5.657101730705275]
We tested how well the representations of $86$ pretrained neural network models mapped to human learning trajectories.
We found that while training dataset size was a core determinant of alignment with human choices, contrastive training with multi-modal data (text and imagery) was a common feature of currently publicly available models that predicted human generalisation.
In conclusion, pretrained neural networks can serve to extract representations for cognitive models, as they appear to capture some fundamental aspects of cognition that are transferable across tasks.
arXiv Detail & Related papers (2023-06-15T08:18:29Z) - Connecting metrics for shape-texture knowledge in computer vision [1.7785095623975342]
Deep neural networks remain brittle and susceptible to many changes in the image that do not cause humans to misclassify images.
Part of this different behavior may be explained by the type of features humans and deep neural networks use in vision tasks.
arXiv Detail & Related papers (2023-01-25T14:37:42Z) - Learn to Predict How Humans Manipulate Large-sized Objects from
Interactive Motions [82.90906153293585]
We propose a graph neural network, HO-GCN, to fuse motion data and dynamic descriptors for the prediction task.
We show the proposed network that consumes dynamic descriptors can achieve state-of-the-art prediction results and help the network better generalize to unseen objects.
arXiv Detail & Related papers (2022-06-25T09:55:39Z) - Guiding Visual Attention in Deep Convolutional Neural Networks Based on
Human Eye Movements [0.0]
Deep Convolutional Neural Networks (DCNNs) were originally inspired by principles of biological vision.
Recent advances in deep learning seem to decrease this similarity.
We investigate a purely data-driven approach to obtain useful models.
arXiv Detail & Related papers (2022-06-21T17:59:23Z) - The world seems different in a social context: a neural network analysis
of human experimental data [57.729312306803955]
We show that it is possible to replicate human behavioral data in both individual and social task settings by modifying the precision of prior and sensory signals.
An analysis of the neural activation traces of the trained networks provides evidence that information is coded in fundamentally different ways in the network in the individual and in the social conditions.
arXiv Detail & Related papers (2022-03-03T17:19:12Z) - Overcoming the Domain Gap in Neural Action Representations [60.47807856873544]
3D pose data can now be reliably extracted from multi-view video sequences without manual intervention.
We propose to use it to guide the encoding of neural action representations together with a set of neural and behavioral augmentations.
To reduce the domain gap, during training, we swap neural and behavioral data across animals that seem to be performing similar actions.
arXiv Detail & Related papers (2021-12-02T12:45:46Z) - Seeing eye-to-eye? A comparison of object recognition performance in
humans and deep convolutional neural networks under image manipulation [0.0]
This study aims towards a behavioral comparison of visual core object recognition performance between humans and feedforward neural networks.
Analyses of accuracy revealed that humans not only outperform DCNNs on all conditions, but also display significantly greater robustness towards shape and most notably color alterations.
arXiv Detail & Related papers (2020-07-13T10:26:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.