Gaze Perception in Humans and CNN-Based Model
- URL: http://arxiv.org/abs/2104.08447v1
- Date: Sat, 17 Apr 2021 04:52:46 GMT
- Title: Gaze Perception in Humans and CNN-Based Model
- Authors: Nicole X. Han, William Yang Wang, Miguel P. Eckstein
- Abstract summary: We compare how a CNN (convolutional neural network) based model of gaze and humans infer the locus of attention in images of real-world scenes.
We show that compared to the model, humans' estimates of the locus of attention are more influenced by the context of the scene.
- Score: 66.89451296340809
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Making accurate inferences about other individuals' locus of attention is
essential for human social interactions and will be important for AI to
effectively interact with humans. In this study, we compare how a CNN
(convolutional neural network) based model of gaze and humans infer the locus
of attention in images of real-world scenes with a number of individuals
looking at a common location. We show that compared to the model, humans'
estimates of the locus of attention are more influenced by the context of the
scene, such as the presence of the attended target and the number of
individuals in the image.
Related papers
- Evaluating Multiview Object Consistency in Humans and Image Models [68.36073530804296]
We leverage an experimental design from the cognitive sciences which requires zero-shot visual inferences about object shape.
We collect 35K trials of behavioral data from over 500 participants.
We then evaluate the performance of common vision models.
arXiv Detail & Related papers (2024-09-09T17:59:13Z) - Pose2Gaze: Eye-body Coordination during Daily Activities for Gaze Prediction from Full-body Poses [11.545286742778977]
We first report a comprehensive analysis of eye-body coordination in various human-object and human-human interaction activities.
We then present Pose2Gaze, a eye-body coordination model that uses a convolutional neural network to extract features from head direction and full-body poses.
arXiv Detail & Related papers (2023-12-19T10:55:46Z) - Seeing Eye to AI: Comparing Human Gaze and Model Attention in Video Memorability [21.44002657362493]
We adopt a simple CNN+Transformer architecture that enables analysis of features while fuse-temporal attention matching state-of-the-art (TASo) performance on video memorability prediction.
We compare model attention against human fixations through a small-scale eye-tracking study where humans perform a memory memory task.
arXiv Detail & Related papers (2023-11-26T05:14:06Z) - Real-time Addressee Estimation: Deployment of a Deep-Learning Model on
the iCub Robot [52.277579221741746]
Addressee Estimation is a skill essential for social robots to interact smoothly with humans.
Inspired by human perceptual skills, a deep-learning model for Addressee Estimation is designed, trained, and deployed on an iCub robot.
The study presents the procedure of such implementation and the performance of the model deployed in real-time human-robot interaction.
arXiv Detail & Related papers (2023-11-09T13:01:21Z) - Do humans and Convolutional Neural Networks attend to similar areas
during scene classification: Effects of task and image type [0.0]
We investigated how the tasks used to elicit human attention maps interact with image characteristics in modulating the similarity between humans and CNN.
We varied the type of image to be categorized, using either singular, salient objects, indoor scenes consisting of object arrangements, or landscapes without distinct objects defining the category.
The influence of human tasks strongly depended on image type: For objects, human manual selection produced maps that were most similar to CNN, while the specific eye movement task has little impact.
arXiv Detail & Related papers (2023-07-25T09:02:29Z) - Evaluating alignment between humans and neural network representations in image-based learning tasks [5.657101730705275]
We tested how well the representations of $86$ pretrained neural network models mapped to human learning trajectories.
We found that while training dataset size was a core determinant of alignment with human choices, contrastive training with multi-modal data (text and imagery) was a common feature of currently publicly available models that predicted human generalisation.
In conclusion, pretrained neural networks can serve to extract representations for cognitive models, as they appear to capture some fundamental aspects of cognition that are transferable across tasks.
arXiv Detail & Related papers (2023-06-15T08:18:29Z) - Human Eyes Inspired Recurrent Neural Networks are More Robust Against Adversarial Noises [7.689542442882423]
We designed a dual-stream vision model inspired by the human brain.
This model features retina-like input layers and includes two streams: one determining the next point of focus (the fixation), while the other interprets the visuals surrounding the fixation.
We evaluated this model against various benchmarks in terms of object recognition, gaze behavior and adversarial robustness.
arXiv Detail & Related papers (2022-06-15T03:44:42Z) - GIMO: Gaze-Informed Human Motion Prediction in Context [75.52839760700833]
We propose a large-scale human motion dataset that delivers high-quality body pose sequences, scene scans, and ego-centric views with eye gaze.
Our data collection is not tied to specific scenes, which further boosts the motion dynamics observed from our subjects.
To realize the full potential of gaze, we propose a novel network architecture that enables bidirectional communication between the gaze and motion branches.
arXiv Detail & Related papers (2022-04-20T13:17:39Z) - What Can You Learn from Your Muscles? Learning Visual Representation
from Human Interactions [50.435861435121915]
We use human interaction and attention cues to investigate whether we can learn better representations compared to visual-only representations.
Our experiments show that our "muscly-supervised" representation outperforms a visual-only state-of-the-art method MoCo.
arXiv Detail & Related papers (2020-10-16T17:46:53Z) - Learning Human-Object Interaction Detection using Interaction Points [140.0200950601552]
We propose a novel fully-convolutional approach that directly detects the interactions between human-object pairs.
Our network predicts interaction points, which directly localize and classify the inter-action.
Experiments are performed on two popular benchmarks: V-COCO and HICO-DET.
arXiv Detail & Related papers (2020-03-31T08:42:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.