Related papers: Transforming Neural Network Visual Representations to Predict Human Judgments of Similarity

Transforming Neural Network Visual Representations to Predict Human Judgments of Similarity

URL: http://arxiv.org/abs/2010.06512v2
Date: Mon, 11 Jan 2021 20:40:33 GMT
Title: Transforming Neural Network Visual Representations to Predict Human Judgments of Similarity
Authors: Maria Attarian, Brett D. Roads, Michael C. Mozer
Abstract summary: We investigate how to bring machine visual representations into better alignment with human representations. We find that with appropriate linear transformations of deep embeddings, we can improve prediction of human binary choice.
Score: 12.5719993304358
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep-learning vision models have shown intriguing similarities and differences with respect to human vision. We investigate how to bring machine visual representations into better alignment with human representations. Human representations are often inferred from behavioral evidence such as the selection of an image most similar to a query image. We find that with appropriate linear transformations of deep embeddings, we can improve prediction of human binary choice on a data set of bird images from 72% at baseline to 89%. We hypothesized that deep embeddings have redundant, high (4096) dimensional representations; however, reducing the rank of these representations results in a loss of explanatory power. We hypothesized that the dilation transformation of representations explored in past research is too restrictive, and indeed we found that model explanatory power can be significantly improved with a more expressive linear transform. Most surprising and exciting, we found that, consistent with classic psychological literature, human similarity judgments are asymmetric: the similarity of X to Y is not necessarily equal to the similarity of Y to X, and allowing models to express this asymmetry improves explanatory power.

Related papers

When Does Perceptual Alignment Benefit Vision Representations? [76.32336818860965]
We investigate how aligning vision model representations to human perceptual judgments impacts their usability. We find that aligning models to perceptual judgments yields representations that improve upon the original backbones across many downstream tasks. Our results suggest that injecting an inductive bias about human perceptual knowledge into vision models can contribute to better representations.
arXiv Detail & Related papers (2024-10-14T17:59:58Z)
Unsupervised Learning of Invariance Transformations [105.54048699217668]
We develop an algorithmic framework for finding approximate graph automorphisms. We discuss how this framework can be used to find approximate automorphisms in weighted graphs in general.
arXiv Detail & Related papers (2023-07-24T17:03:28Z)
Prune and distill: similar reformatting of image information along rat visual cortex and deep neural networks [61.60177890353585]
Deep convolutional neural networks (CNNs) have been shown to provide excellent models for its functional analogue in the brain, the ventral stream in visual cortex. Here we consider some prominent statistical patterns that are known to exist in the internal representations of either CNNs or the visual cortex. We show that CNNs and visual cortex share a similarly tight relationship between dimensionality expansion/reduction of object representations and reformatting of image information.
arXiv Detail & Related papers (2022-05-27T08:06:40Z)
Predicting Human Similarity Judgments Using Large Language Models [13.33450619901885]
We propose an efficient procedure for predicting similarity judgments based on text descriptions. The number of descriptions required grows only linearly with the number of stimuli, drastically reducing the amount of data required. We test this procedure on six datasets of naturalistic images and show that our models outperform previous approaches based on visual information.
arXiv Detail & Related papers (2022-02-09T21:09:25Z)
Finding Biological Plausibility for Adversarially Robust Features via Metameric Tasks [3.3504365823045044]
We show that adversarially robust representations capture peripheral computation better than non-robust representations. Our findings support the idea that localized texture summary statistic representations may drive human in robustness to adversarials.
arXiv Detail & Related papers (2022-02-02T01:19:40Z)
On the use of Cortical Magnification and Saccades as Biological Proxies for Data Augmentation [9.848635287149355]
Most self-supervised methods encourage the system to learn an invariant representation of different transformations of the same image. In this paper, we attempt to reverse-engineer these augmentations to be more biologically or perceptually plausible. We find that random cropping can be substituted by cortical magnification, and saccade-like sampling of the image could also assist the representation learning.
arXiv Detail & Related papers (2021-12-14T05:38:26Z)
Unravelling the Effect of Image Distortions for Biased Prediction of Pre-trained Face Recognition Models [86.79402670904338]
We evaluate the performance of four state-of-the-art deep face recognition models in the presence of image distortions. We have observed that image distortions have a relationship with the performance gap of the model across different subgroups.
arXiv Detail & Related papers (2021-08-14T16:49:05Z)
Visual stream connectivity predicts assessments of image quality [0.0]
We derive a novel formalization of the psychophysics of similarity, showing the differential geometry that provides accurate and explanatory accounts of perceptual similarity judgments. Predictions are further improved via simple regression on human behavioral reports, which in turn are used to construct more elaborate hypothesized neural connectivity patterns.
arXiv Detail & Related papers (2020-08-16T15:38:17Z)
Adversarial Semantic Data Augmentation for Human Pose Estimation [96.75411357541438]
We propose Semantic Data Augmentation (SDA), a method that augments images by pasting segmented body parts with various semantic granularity. We also propose Adversarial Semantic Data Augmentation (ASDA), which exploits a generative network to dynamiclly predict tailored pasting configuration. State-of-the-art results are achieved on challenging benchmarks.
arXiv Detail & Related papers (2020-08-03T07:56:04Z)
Learning Disentangled Representations with Latent Variation Predictability [102.4163768995288]
This paper defines the variation predictability of latent disentangled representations. Within an adversarial generation process, we encourage variation predictability by maximizing the mutual information between latent variations and corresponding image pairs. We develop an evaluation metric that does not rely on the ground-truth generative factors to measure the disentanglement of latent representations.
arXiv Detail & Related papers (2020-07-25T08:54:26Z)
Extracting low-dimensional psychological representations from convolutional neural networks [10.269997499911666]
We present a method for reducing neural network representations to a low-dimensional space which is still predictive of similarity judgments. We show that these low-dimensional representations also provide insightful explanations of factors underlying human similarity judgments.
arXiv Detail & Related papers (2020-05-29T01:29:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.