VisAlign: Dataset for Measuring the Degree of Alignment between AI and
Humans in Visual Perception
- URL: http://arxiv.org/abs/2308.01525v3
- Date: Fri, 20 Oct 2023 04:24:25 GMT
- Title: VisAlign: Dataset for Measuring the Degree of Alignment between AI and
Humans in Visual Perception
- Authors: Jiyoung Lee, Seungho Kim, Seunghyun Won, Joonseok Lee, Marzyeh
Ghassemi, James Thorne, Jaeseok Choi, O-Kil Kwon, Edward Choi
- Abstract summary: We propose a new dataset for measuring AI-human visual alignment in terms of image classification.
Our dataset consists of three groups of samples, namely Must-Act (i.e., Must-Classify), Must-Abstain, and Uncertain.
We analyze the visual alignment and reliability of five popular visual perception models and seven abstention methods.
- Score: 32.376529738717736
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: AI alignment refers to models acting towards human-intended goals,
preferences, or ethical principles. Given that most large-scale deep learning
models act as black boxes and cannot be manually controlled, analyzing the
similarity between models and humans can be a proxy measure for ensuring AI
safety. In this paper, we focus on the models' visual perception alignment with
humans, further referred to as AI-human visual alignment. Specifically, we
propose a new dataset for measuring AI-human visual alignment in terms of image
classification, a fundamental task in machine perception. In order to evaluate
AI-human visual alignment, a dataset should encompass samples with various
scenarios that may arise in the real world and have gold human perception
labels. Our dataset consists of three groups of samples, namely Must-Act (i.e.,
Must-Classify), Must-Abstain, and Uncertain, based on the quantity and clarity
of visual information in an image and further divided into eight categories.
All samples have a gold human perception label; even Uncertain (severely
blurry) sample labels were obtained via crowd-sourcing. The validity of our
dataset is verified by sampling theory, statistical theories related to survey
design, and experts in the related fields. Using our dataset, we analyze the
visual alignment and reliability of five popular visual perception models and
seven abstention methods. Our code and data is available at
https://github.com/jiyounglee-0523/VisAlign.
Related papers
- When Does Perceptual Alignment Benefit Vision Representations? [76.32336818860965]
We investigate how aligning vision model representations to human perceptual judgments impacts their usability.
We find that aligning models to perceptual judgments yields representations that improve upon the original backbones across many downstream tasks.
Our results suggest that injecting an inductive bias about human perceptual knowledge into vision models can contribute to better representations.
arXiv Detail & Related papers (2024-10-14T17:59:58Z) - Evaluating Multiview Object Consistency in Humans and Image Models [68.36073530804296]
We leverage an experimental design from the cognitive sciences which requires zero-shot visual inferences about object shape.
We collect 35K trials of behavioral data from over 500 participants.
We then evaluate the performance of common vision models.
arXiv Detail & Related papers (2024-09-09T17:59:13Z) - BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation [57.40024206484446]
We introduce the BEHAVIOR Vision Suite (BVS), a set of tools and assets to generate fully customized synthetic data for systematic evaluation of computer vision models.
BVS supports a large number of adjustable parameters at the scene level.
We showcase three example application scenarios.
arXiv Detail & Related papers (2024-05-15T17:57:56Z) - SeeBel: Seeing is Believing [0.9790236766474201]
We propose three visualizations that enable users to compare dataset statistics and AI performance for segmenting all images.
Our project tries to further increase the interpretability of the trained AI model for segmentation by visualizing its image attention weights.
We propose to conduct surveys on real users to study the efficacy of our visualization tool in computer vision and AI domain.
arXiv Detail & Related papers (2023-12-18T05:11:00Z) - Heuristic Vision Pre-Training with Self-Supervised and Supervised
Multi-Task Learning [0.0]
We propose a novel pre-training framework by adopting both self-supervised and supervised visual pre-text tasks in a multi-task manner.
Results show that our pre-trained models can deliver results on par with or better than state-of-the-art (SOTA) results on multiple visual tasks.
arXiv Detail & Related papers (2023-10-11T14:06:04Z) - ArtWhisperer: A Dataset for Characterizing Human-AI Interactions in Artistic Creations [26.4215586218117]
This work investigates how people use text-to-image models to generate desired target images.
We created ArtWhisperer, an online game where users are given a target image and are tasked with iteratively finding a prompt that creates a similar-looking image as the target.
We recorded over 50,000 human-AI interactions; each interaction corresponds to one text prompt created by a user and the corresponding generated image.
arXiv Detail & Related papers (2023-06-13T21:10:45Z) - HumanBench: Towards General Human-centric Perception with Projector
Assisted Pretraining [75.1086193340286]
It is desirable to have a general pretrain model for versatile human-centric downstream tasks.
We propose a textbfHumanBench based on existing datasets to evaluate on the common ground the generalization abilities of different pretraining methods.
Our PATH achieves new state-of-the-art results on 17 downstream datasets and on-par results on the other 2 datasets.
arXiv Detail & Related papers (2023-03-10T02:57:07Z) - Exploring Alignment of Representations with Human Perception [47.53970721813083]
We show that inputs that are mapped to similar representations by the model should be perceived similarly by humans.
Our approach yields a measure of the extent to which a model is aligned with human perception.
We find that various properties of a model like its architecture, training paradigm, training loss, and data augmentation play a significant role in learning representations that are aligned with human perception.
arXiv Detail & Related papers (2021-11-29T17:26:50Z) - Visual Distant Supervision for Scene Graph Generation [66.10579690929623]
Scene graph models usually require supervised learning on large quantities of labeled data with intensive human annotation.
We propose visual distant supervision, a novel paradigm of visual relation learning, which can train scene graph models without any human-labeled data.
Comprehensive experimental results show that our distantly supervised model outperforms strong weakly supervised and semi-supervised baselines.
arXiv Detail & Related papers (2021-03-29T06:35:24Z) - What Can You Learn from Your Muscles? Learning Visual Representation
from Human Interactions [50.435861435121915]
We use human interaction and attention cues to investigate whether we can learn better representations compared to visual-only representations.
Our experiments show that our "muscly-supervised" representation outperforms a visual-only state-of-the-art method MoCo.
arXiv Detail & Related papers (2020-10-16T17:46:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.