Gaze Perception in Humans and CNN-Based Model
- URL: http://arxiv.org/abs/2104.08447v1
- Date: Sat, 17 Apr 2021 04:52:46 GMT
- Title: Gaze Perception in Humans and CNN-Based Model
- Authors: Nicole X. Han, William Yang Wang, Miguel P. Eckstein
- Abstract summary: We compare how a CNN (convolutional neural network) based model of gaze and humans infer the locus of attention in images of real-world scenes.
We show that compared to the model, humans' estimates of the locus of attention are more influenced by the context of the scene.
- Score: 66.89451296340809
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Making accurate inferences about other individuals' locus of attention is
essential for human social interactions and will be important for AI to
effectively interact with humans. In this study, we compare how a CNN
(convolutional neural network) based model of gaze and humans infer the locus
of attention in images of real-world scenes with a number of individuals
looking at a common location. We show that compared to the model, humans'
estimates of the locus of attention are more influenced by the context of the
scene, such as the presence of the attended target and the number of
individuals in the image.
Related papers
- Pose2Gaze: Eye-body Coordination during Daily Activities for Gaze Prediction from Full-body Poses [11.545286742778977]
We first report a comprehensive analysis of eye-body coordination in various human-object and human-human interaction activities.
We then present Pose2Gaze, a eye-body coordination model that uses a convolutional neural network to extract features from head direction and full-body poses.
arXiv Detail & Related papers (2023-12-19T10:55:46Z) - Generating Human-Centric Visual Cues for Human-Object Interaction
Detection via Large Vision-Language Models [59.611697856666304]
Human-object interaction (HOI) detection aims at detecting human-object pairs and predicting their interactions.
We propose three prompts with VLM to generate human-centric visual cues within an image from multiple perspectives of humans.
We develop a transformer-based multimodal fusion module with multitower architecture to integrate visual cue features into the instance and interaction decoders.
arXiv Detail & Related papers (2023-11-26T09:11:32Z) - Eye vs. AI: Human Gaze and Model Attention in Video Memorability [22.718191366938278]
We propose a Transformer-based model with naturalistic-temporal attention that matches SoTA performance on video memorability prediction.
We compare model attention against human gaze fixation density maps collected through a small-scale eye-tracking experiment.
We observe that the model assigns greater importance to the initial frames, mimicking temporal attention patterns found in humans.
arXiv Detail & Related papers (2023-11-26T05:14:06Z) - Real-time Addressee Estimation: Deployment of a Deep-Learning Model on
the iCub Robot [52.277579221741746]
Addressee Estimation is a skill essential for social robots to interact smoothly with humans.
Inspired by human perceptual skills, a deep-learning model for Addressee Estimation is designed, trained, and deployed on an iCub robot.
The study presents the procedure of such implementation and the performance of the model deployed in real-time human-robot interaction.
arXiv Detail & Related papers (2023-11-09T13:01:21Z) - Do humans and Convolutional Neural Networks attend to similar areas
during scene classification: Effects of task and image type [0.0]
We investigated how the tasks used to elicit human attention maps interact with image characteristics in modulating the similarity between humans and CNN.
We varied the type of image to be categorized, using either singular, salient objects, indoor scenes consisting of object arrangements, or landscapes without distinct objects defining the category.
The influence of human tasks strongly depended on image type: For objects, human manual selection produced maps that were most similar to CNN, while the specific eye movement task has little impact.
arXiv Detail & Related papers (2023-07-25T09:02:29Z) - Human alignment of neural network representations [22.671101285994013]
We investigate the factors that affect the alignment between the representations learned by neural networks and human mental representations inferred from behavioral responses.
We find that model scale and architecture have essentially no effect on the alignment with human behavioral responses.
We find that some human concepts such as food and animals are well-represented by neural networks whereas others such as royal or sports-related objects are not.
arXiv Detail & Related papers (2022-11-02T15:23:16Z) - GIMO: Gaze-Informed Human Motion Prediction in Context [75.52839760700833]
We propose a large-scale human motion dataset that delivers high-quality body pose sequences, scene scans, and ego-centric views with eye gaze.
Our data collection is not tied to specific scenes, which further boosts the motion dynamics observed from our subjects.
To realize the full potential of gaze, we propose a novel network architecture that enables bidirectional communication between the gaze and motion branches.
arXiv Detail & Related papers (2022-04-20T13:17:39Z) - What Can You Learn from Your Muscles? Learning Visual Representation
from Human Interactions [50.435861435121915]
We use human interaction and attention cues to investigate whether we can learn better representations compared to visual-only representations.
Our experiments show that our "muscly-supervised" representation outperforms a visual-only state-of-the-art method MoCo.
arXiv Detail & Related papers (2020-10-16T17:46:53Z) - Learning Human-Object Interaction Detection using Interaction Points [140.0200950601552]
We propose a novel fully-convolutional approach that directly detects the interactions between human-object pairs.
Our network predicts interaction points, which directly localize and classify the inter-action.
Experiments are performed on two popular benchmarks: V-COCO and HICO-DET.
arXiv Detail & Related papers (2020-03-31T08:42:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.