Target-absent Human Attention
- URL: http://arxiv.org/abs/2207.01166v1
- Date: Mon, 4 Jul 2022 02:32:04 GMT
- Title: Target-absent Human Attention
- Authors: Zhibo Yang, Sounak Mondal, Seoyoung Ahn, Gregory Zelinsky, Minh Hoai,
Dimitris Samaras
- Abstract summary: We propose the first data-driven computational model that addresses the search-termination problem.
We represent the internal knowledge that the viewer acquires through fixations using a novel state representation.
We improve the state of the art in predicting human target-absent search behavior on the COCO-Search18 dataset.
- Score: 44.10971508325032
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The prediction of human gaze behavior is important for building
human-computer interactive systems that can anticipate a user's attention.
Computer vision models have been developed to predict the fixations made by
people as they search for target objects. But what about when the image has no
target? Equally important is to know how people search when they cannot find a
target, and when they would stop searching. In this paper, we propose the first
data-driven computational model that addresses the search-termination problem
and predicts the scanpath of search fixations made by people searching for
targets that do not appear in images. We model visual search as an imitation
learning problem and represent the internal knowledge that the viewer acquires
through fixations using a novel state representation that we call Foveated
Feature Maps (FFMs). FFMs integrate a simulated foveated retina into a
pretrained ConvNet that produces an in-network feature pyramid, all with
minimal computational overhead. Our method integrates FFMs as the state
representation in inverse reinforcement learning. Experimentally, we improve
the state of the art in predicting human target-absent search behavior on the
COCO-Search18 dataset
Related papers
- Human-Robot Collaborative Minimum Time Search through Sub-priors in Ant Colony Optimization [3.04478108783992]
This paper presents an extension of the Ant Colony Optimization (ACO) meta-heuristic to solve the Minimum Time Search (MTS) task.
The proposed model consists of two main blocks. The first one is a convolutional neural network (CNN) that provides the prior probabilities about where an object may be from a segmented image.
The second one is the Sub-prior MTS-ACO algorithm (SP-MTS-ACO), which takes as inputs the prior probabilities and the particular search preferences of the agents in different sub-priors to generate search plans for all agents.
arXiv Detail & Related papers (2024-10-01T08:57:28Z) - OAT: Object-Level Attention Transformer for Gaze Scanpath Prediction [0.2796197251957245]
This paper introduces the Object-level Attention Transformer (OAT)
OAT predicts human scanpaths as they search for a target object within a cluttered scene of distractors.
We evaluate OAT on the Amazon book cover dataset and a new dataset for visual search that we collected.
arXiv Detail & Related papers (2024-07-18T09:33:17Z) - Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption [64.07607726562841]
Existing multi-person human reconstruction approaches mainly focus on recovering accurate poses or avoiding penetration.
In this work, we tackle the task of reconstructing closely interactive humans from a monocular video.
We propose to leverage knowledge from proxemic behavior and physics to compensate the lack of visual information.
arXiv Detail & Related papers (2024-04-17T11:55:45Z) - Predicting Visual Attention and Distraction During Visual Search Using
Convolutional Neural Networks [2.7920304852537527]
We present two approaches to model visual attention and distraction of observers during visual search.
Our first approach adapts a light-weight free-viewing saliency model to predict eye fixation density maps of human observers over pixels of search images.
Our second approach is object-based and predicts the distractor and target objects during visual search.
arXiv Detail & Related papers (2022-10-27T00:39:43Z) - MECCANO: A Multimodal Egocentric Dataset for Humans Behavior
Understanding in the Industrial-like Domain [23.598727613908853]
We present MECCANO, a dataset of egocentric videos to study humans behavior understanding in industrial-like settings.
The multimodality is characterized by the presence of gaze signals, depth maps and RGB videos acquired simultaneously with a custom headset.
The dataset has been explicitly labeled for fundamental tasks in the context of human behavior understanding from a first person view.
arXiv Detail & Related papers (2022-09-19T00:52:42Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z) - What Can You Learn from Your Muscles? Learning Visual Representation
from Human Interactions [50.435861435121915]
We use human interaction and attention cues to investigate whether we can learn better representations compared to visual-only representations.
Our experiments show that our "muscly-supervised" representation outperforms a visual-only state-of-the-art method MoCo.
arXiv Detail & Related papers (2020-10-16T17:46:53Z) - Predicting Goal-directed Human Attention Using Inverse Reinforcement
Learning [44.774961463015245]
We propose the first inverse reinforcement learning model to learn the internal reward function and policy used by humans during visual search.
To train and evaluate our IRL model we created COCO-Search18, which is now the largest dataset of high-quality search fixations in existence.
arXiv Detail & Related papers (2020-05-28T21:46:27Z) - Learning Human-Object Interaction Detection using Interaction Points [140.0200950601552]
We propose a novel fully-convolutional approach that directly detects the interactions between human-object pairs.
Our network predicts interaction points, which directly localize and classify the inter-action.
Experiments are performed on two popular benchmarks: V-COCO and HICO-DET.
arXiv Detail & Related papers (2020-03-31T08:42:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.