Related papers: Exploring Alignment of Representations with Human Perception

Exploring Alignment of Representations with Human Perception

URL: http://arxiv.org/abs/2111.14726v1
Date: Mon, 29 Nov 2021 17:26:50 GMT
Title: Exploring Alignment of Representations with Human Perception
Authors: Vedant Nanda and Ayan Majumdar and Camila Kolling and John P. Dickerson and Krishna P. Gummadi and Bradley C. Love and Adrian Weller
Abstract summary: We show that inputs that are mapped to similar representations by the model should be perceived similarly by humans. Our approach yields a measure of the extent to which a model is aligned with human perception. We find that various properties of a model like its architecture, training paradigm, training loss, and data augmentation play a significant role in learning representations that are aligned with human perception.
Score: 47.53970721813083
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We argue that a valuable perspective on when a model learns \textit{good} representations is that inputs that are mapped to similar representations by the model should be perceived similarly by humans. We use \textit{representation inversion} to generate multiple inputs that map to the same model representation, then quantify the perceptual similarity of these inputs via human surveys. Our approach yields a measure of the extent to which a model is aligned with human perception. Using this measure of alignment, we evaluate models trained with various learning paradigms (\eg~supervised and self-supervised learning) and different training losses (standard and robust training). Our results suggest that the alignment of representations with human perception provides useful additional insights into the qualities of a model. For example, we find that alignment with human perception can be used as a measure of trust in a model's prediction on inputs where different models have conflicting outputs. We also find that various properties of a model like its architecture, training paradigm, training loss, and data augmentation play a significant role in learning representations that are aligned with human perception.

Related papers

CHART-6: Human-Centered Evaluation of Data Visualization Understanding in Vision-Language Models [18.891323067948285]
It is unclear to what degree vision-language models emulate human behavior on tasks that involve reasoning about data visualizations.<n>Here we evaluated eight vision-language models on six data visualization literacy assessments designed for humans.<n>We found that these models performed worse than human participants on average.
arXiv Detail & Related papers (2025-05-22T18:15:04Z)
Investigating Fine- and Coarse-grained Structural Correspondences Between Deep Neural Networks and Human Object Image Similarity Judgments Using Unsupervised Alignment [0.14999444543328289]
We employ an unsupervised alignment method based on Gromov-Wasserstein Optimal Transport to compare human and model object representations.<n>We find that models trained with CLIP consistently achieve strong fine- and coarse-grained matching with human object representations.<n>Our results offer new insights into the role of linguistic information in acquiring precise object representations.
arXiv Detail & Related papers (2025-05-22T09:06:06Z)
When Does Perceptual Alignment Benefit Vision Representations? [76.32336818860965]
We investigate how aligning vision model representations to human perceptual judgments impacts their usability. We find that aligning models to perceptual judgments yields representations that improve upon the original backbones across many downstream tasks. Our results suggest that injecting an inductive bias about human perceptual knowledge into vision models can contribute to better representations.
arXiv Detail & Related papers (2024-10-14T17:59:58Z)
Evaluating Multiview Object Consistency in Humans and Image Models [68.36073530804296]
We leverage an experimental design from the cognitive sciences which requires zero-shot visual inferences about object shape. We collect 35K trials of behavioral data from over 500 participants. We then evaluate the performance of common vision models.
arXiv Detail & Related papers (2024-09-09T17:59:13Z)
Dual Thinking and Perceptual Analysis of Deep Learning Models using Human Adversarial Examples [5.022336433202968]
The perception of dual thinking in vision requires images where inferences from intuitive and logical processing differ. We introduce an adversarial dataset to provide evidence for the dual thinking framework in human vision. Our study also addresses a major criticism of using classification models as computational models of human vision.
arXiv Detail & Related papers (2024-06-11T05:50:34Z)
Learning Human-Aligned Representations with Contrastive Learning and Generative Similarity [9.63129238638334]
Humans rely on effective representations to learn from few examples and abstract useful information from sensory data. We use a Bayesian notion of generative similarity whereby two data points are considered similar if they are likely to have been sampled from the same distribution. We demonstrate the utility of our approach by showing that it can be used to capture human-like representations of shape regularity, abstract Euclidean geometric concepts, and semantic hierarchies for natural images.
arXiv Detail & Related papers (2024-05-29T18:01:58Z)
Revisiting Self-supervised Learning of Speech Representation from a Mutual Information Perspective [68.20531518525273]
We take a closer look into existing self-supervised methods of speech from an information-theoretic perspective. We use linear probes to estimate the mutual information between the target information and learned representations. We explore the potential of evaluating representations in a self-supervised fashion, where we estimate the mutual information between different parts of the data without using any labels.
arXiv Detail & Related papers (2024-01-16T21:13:22Z)
Longer Fixations, More Computation: Gaze-Guided Recurrent Neural Networks [12.57650361978445]
Humans read texts at a varying pace, while machine learning models treat each token in the same way. In this paper, we convert this intuition into a set of novel models with fixation-guided parallel RNNs or layers. We find that, interestingly, the fixation duration predicted by neural networks bears some resemblance to humans' fixation.
arXiv Detail & Related papers (2023-10-31T21:32:11Z)
Evaluating alignment between humans and neural network representations in image-based learning tasks [5.657101730705275]
We tested how well the representations of $86$ pretrained neural network models mapped to human learning trajectories. We found that while training dataset size was a core determinant of alignment with human choices, contrastive training with multi-modal data (text and imagery) was a common feature of currently publicly available models that predicted human generalisation. In conclusion, pretrained neural networks can serve to extract representations for cognitive models, as they appear to capture some fundamental aspects of cognition that are transferable across tasks.
arXiv Detail & Related papers (2023-06-15T08:18:29Z)
Visual Affordance Prediction for Guiding Robot Exploration [56.17795036091848]
We develop an approach for learning visual affordances for guiding robot exploration. We use a Transformer-based model to learn a conditional distribution in the latent embedding space of a VQ-VAE. We show how the trained affordance model can be used for guiding exploration by acting as a goal-sampling distribution, during visual goal-conditioned policy learning in robotic manipulation.
arXiv Detail & Related papers (2023-05-28T17:53:09Z)
Alignment with human representations supports robust few-shot learning [14.918671859247429]
We show there should be a U-shaped relationship between the degree of representational alignment with humans and performance on few-shot learning tasks. We also show that highly-aligned models are more robust to both natural adversarial attacks and domain shifts. Our results suggest that human-alignment is often a sufficient, but not necessary, condition for models to make effective use of limited data, be robust, and generalize well.
arXiv Detail & Related papers (2023-01-27T21:03:19Z)
Explain, Edit, and Understand: Rethinking User Study Design for Evaluating Model Explanations [97.91630330328815]
We conduct a crowdsourcing study, where participants interact with deception detection models that have been trained to distinguish between genuine and fake hotel reviews. We observe that for a linear bag-of-words model, participants with access to the feature coefficients during training are able to cause a larger reduction in model confidence in the testing phase when compared to the no-explanation control.
arXiv Detail & Related papers (2021-12-17T18:29:56Z)
Human-Understandable Decision Making for Visual Recognition [30.30163407674527]
We propose a new framework to train a deep neural network by incorporating the prior of human perception into the model learning process. The effectiveness of our proposed model is evaluated on two classical visual recognition tasks.
arXiv Detail & Related papers (2021-03-05T02:07:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.