Transforming Neural Network Visual Representations to Predict Human
Judgments of Similarity
- URL: http://arxiv.org/abs/2010.06512v2
- Date: Mon, 11 Jan 2021 20:40:33 GMT
- Title: Transforming Neural Network Visual Representations to Predict Human
Judgments of Similarity
- Authors: Maria Attarian, Brett D. Roads, Michael C. Mozer
- Abstract summary: We investigate how to bring machine visual representations into better alignment with human representations.
We find that with appropriate linear transformations of deep embeddings, we can improve prediction of human binary choice.
- Score: 12.5719993304358
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep-learning vision models have shown intriguing similarities and
differences with respect to human vision. We investigate how to bring machine
visual representations into better alignment with human representations. Human
representations are often inferred from behavioral evidence such as the
selection of an image most similar to a query image. We find that with
appropriate linear transformations of deep embeddings, we can improve
prediction of human binary choice on a data set of bird images from 72% at
baseline to 89%. We hypothesized that deep embeddings have redundant, high
(4096) dimensional representations; however, reducing the rank of these
representations results in a loss of explanatory power. We hypothesized that
the dilation transformation of representations explored in past research is too
restrictive, and indeed we found that model explanatory power can be
significantly improved with a more expressive linear transform. Most surprising
and exciting, we found that, consistent with classic psychological literature,
human similarity judgments are asymmetric: the similarity of X to Y is not
necessarily equal to the similarity of Y to X, and allowing models to express
this asymmetry improves explanatory power.
Related papers
- When Does Perceptual Alignment Benefit Vision Representations? [76.32336818860965]
We investigate how aligning vision model representations to human perceptual judgments impacts their usability.
We find that aligning models to perceptual judgments yields representations that improve upon the original backbones across many downstream tasks.
Our results suggest that injecting an inductive bias about human perceptual knowledge into vision models can contribute to better representations.
arXiv Detail & Related papers (2024-10-14T17:59:58Z) - Unsupervised Learning of Invariance Transformations [105.54048699217668]
We develop an algorithmic framework for finding approximate graph automorphisms.
We discuss how this framework can be used to find approximate automorphisms in weighted graphs in general.
arXiv Detail & Related papers (2023-07-24T17:03:28Z) - Prune and distill: similar reformatting of image information along rat
visual cortex and deep neural networks [61.60177890353585]
Deep convolutional neural networks (CNNs) have been shown to provide excellent models for its functional analogue in the brain, the ventral stream in visual cortex.
Here we consider some prominent statistical patterns that are known to exist in the internal representations of either CNNs or the visual cortex.
We show that CNNs and visual cortex share a similarly tight relationship between dimensionality expansion/reduction of object representations and reformatting of image information.
arXiv Detail & Related papers (2022-05-27T08:06:40Z) - Predicting Human Similarity Judgments Using Large Language Models [13.33450619901885]
We propose an efficient procedure for predicting similarity judgments based on text descriptions.
The number of descriptions required grows only linearly with the number of stimuli, drastically reducing the amount of data required.
We test this procedure on six datasets of naturalistic images and show that our models outperform previous approaches based on visual information.
arXiv Detail & Related papers (2022-02-09T21:09:25Z) - Finding Biological Plausibility for Adversarially Robust Features via
Metameric Tasks [3.3504365823045044]
We show that adversarially robust representations capture peripheral computation better than non-robust representations.
Our findings support the idea that localized texture summary statistic representations may drive human in robustness to adversarials.
arXiv Detail & Related papers (2022-02-02T01:19:40Z) - On the use of Cortical Magnification and Saccades as Biological Proxies
for Data Augmentation [9.848635287149355]
Most self-supervised methods encourage the system to learn an invariant representation of different transformations of the same image.
In this paper, we attempt to reverse-engineer these augmentations to be more biologically or perceptually plausible.
We find that random cropping can be substituted by cortical magnification, and saccade-like sampling of the image could also assist the representation learning.
arXiv Detail & Related papers (2021-12-14T05:38:26Z) - Unravelling the Effect of Image Distortions for Biased Prediction of
Pre-trained Face Recognition Models [86.79402670904338]
We evaluate the performance of four state-of-the-art deep face recognition models in the presence of image distortions.
We have observed that image distortions have a relationship with the performance gap of the model across different subgroups.
arXiv Detail & Related papers (2021-08-14T16:49:05Z) - Visual stream connectivity predicts assessments of image quality [0.0]
We derive a novel formalization of the psychophysics of similarity, showing the differential geometry that provides accurate and explanatory accounts of perceptual similarity judgments.
Predictions are further improved via simple regression on human behavioral reports, which in turn are used to construct more elaborate hypothesized neural connectivity patterns.
arXiv Detail & Related papers (2020-08-16T15:38:17Z) - Adversarial Semantic Data Augmentation for Human Pose Estimation [96.75411357541438]
We propose Semantic Data Augmentation (SDA), a method that augments images by pasting segmented body parts with various semantic granularity.
We also propose Adversarial Semantic Data Augmentation (ASDA), which exploits a generative network to dynamiclly predict tailored pasting configuration.
State-of-the-art results are achieved on challenging benchmarks.
arXiv Detail & Related papers (2020-08-03T07:56:04Z) - Learning Disentangled Representations with Latent Variation
Predictability [102.4163768995288]
This paper defines the variation predictability of latent disentangled representations.
Within an adversarial generation process, we encourage variation predictability by maximizing the mutual information between latent variations and corresponding image pairs.
We develop an evaluation metric that does not rely on the ground-truth generative factors to measure the disentanglement of latent representations.
arXiv Detail & Related papers (2020-07-25T08:54:26Z) - Extracting low-dimensional psychological representations from
convolutional neural networks [10.269997499911666]
We present a method for reducing neural network representations to a low-dimensional space which is still predictive of similarity judgments.
We show that these low-dimensional representations also provide insightful explanations of factors underlying human similarity judgments.
arXiv Detail & Related papers (2020-05-29T01:29:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.