Predicting Goal-directed Human Attention Using Inverse Reinforcement
Learning
- URL: http://arxiv.org/abs/2005.14310v2
- Date: Thu, 25 Jun 2020 10:56:15 GMT
- Title: Predicting Goal-directed Human Attention Using Inverse Reinforcement
Learning
- Authors: Zhibo Yang, Lihan Huang, Yupei Chen, Zijun Wei, Seoyoung Ahn, Gregory
Zelinsky, Dimitris Samaras, Minh Hoai
- Abstract summary: We propose the first inverse reinforcement learning model to learn the internal reward function and policy used by humans during visual search.
To train and evaluate our IRL model we created COCO-Search18, which is now the largest dataset of high-quality search fixations in existence.
- Score: 44.774961463015245
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Being able to predict human gaze behavior has obvious importance for
behavioral vision and for computer vision applications. Most models have mainly
focused on predicting free-viewing behavior using saliency maps, but these
predictions do not generalize to goal-directed behavior, such as when a person
searches for a visual target object. We propose the first inverse reinforcement
learning (IRL) model to learn the internal reward function and policy used by
humans during visual search. The viewer's internal belief states were modeled
as dynamic contextual belief maps of object locations. These maps were learned
by IRL and then used to predict behavioral scanpaths for multiple target
categories. To train and evaluate our IRL model we created COCO-Search18, which
is now the largest dataset of high-quality search fixations in existence.
COCO-Search18 has 10 participants searching for each of 18 target-object
categories in 6202 images, making about 300,000 goal-directed fixations. When
trained and evaluated on COCO-Search18, the IRL model outperformed baseline
models in predicting search fixation scanpaths, both in terms of similarity to
human search behavior and search efficiency. Finally, reward maps recovered by
the IRL model reveal distinctive target-dependent patterns of object
prioritization, which we interpret as a learned object context.
Related papers
- Evaluating Multiview Object Consistency in Humans and Image Models [68.36073530804296]
We leverage an experimental design from the cognitive sciences which requires zero-shot visual inferences about object shape.
We collect 35K trials of behavioral data from over 500 participants.
We then evaluate the performance of common vision models.
arXiv Detail & Related papers (2024-09-09T17:59:13Z) - Zero-Shot Object-Centric Representation Learning [72.43369950684057]
We study current object-centric methods through the lens of zero-shot generalization.
We introduce a benchmark comprising eight different synthetic and real-world datasets.
We find that training on diverse real-world images improves transferability to unseen scenarios.
arXiv Detail & Related papers (2024-08-17T10:37:07Z) - Unified Dynamic Scanpath Predictors Outperform Individually Trained Neural Models [18.327960366321655]
We develop a deep learning-based social cue integration model for saliency prediction to predict scanpaths in videos.
We evaluate our approach on gaze of dynamic social scenes, observed under the free-viewing condition.
Results indicate that a single unified model, trained on all the observers' scanpaths, performs on par or better than individually trained models.
arXiv Detail & Related papers (2024-05-05T13:15:11Z) - Semantic-Based Active Perception for Humanoid Visual Tasks with Foveal Sensors [49.99728312519117]
The aim of this work is to establish how accurately a recent semantic-based active perception model is able to complete visual tasks that are regularly performed by humans.
This model exploits the ability of current object detectors to localize and classify a large number of object classes and to update a semantic description of a scene across multiple fixations.
In the task of scene exploration, the semantic-based method demonstrates superior performance compared to the traditional saliency-based model.
arXiv Detail & Related papers (2024-04-16T18:15:57Z) - Predicting Visual Attention and Distraction During Visual Search Using
Convolutional Neural Networks [2.7920304852537527]
We present two approaches to model visual attention and distraction of observers during visual search.
Our first approach adapts a light-weight free-viewing saliency model to predict eye fixation density maps of human observers over pixels of search images.
Our second approach is object-based and predicts the distractor and target objects during visual search.
arXiv Detail & Related papers (2022-10-27T00:39:43Z) - Target-absent Human Attention [44.10971508325032]
We propose the first data-driven computational model that addresses the search-termination problem.
We represent the internal knowledge that the viewer acquires through fixations using a novel state representation.
We improve the state of the art in predicting human target-absent search behavior on the COCO-Search18 dataset.
arXiv Detail & Related papers (2022-07-04T02:32:04Z) - SEAL: Self-supervised Embodied Active Learning using Exploration and 3D
Consistency [122.18108118190334]
We present a framework called Self- Embodied Embodied Active Learning (SEAL)
It utilizes perception models trained on internet images to learn an active exploration policy.
We and build utilize 3D semantic maps to learn both action and perception in a completely self-supervised manner.
arXiv Detail & Related papers (2021-12-02T06:26:38Z) - Modeling human visual search: A combined Bayesian searcher and saliency
map approach for eye movement guidance in natural scenes [0.0]
We propose a unified Bayesian model for visual search guided by saliency maps as prior information.
We show that state-of-the-art saliency models perform well in predicting the first two fixations in a visual search task, but their performance degrades to chance afterward.
This suggests that saliency maps alone are good to model bottom-up first impressions, but are not enough to explain the scanpaths when top-down task information is critical.
arXiv Detail & Related papers (2020-09-17T15:38:23Z) - Predicting Goal-directed Attention Control Using Inverse-Reinforcement
Learning [25.721096184051724]
Using machine learning and the psychologically-meaningful principle of reward, it is possible to learn the visual features used in goal-directed attention control.
We collected 16,184 fixations from people searching for either microwaves or clocks in a dataset of 4,366 images (MS-COCO)
We used this behaviorally-annotated dataset and the machine learning method of Inverse-Reinforcement Learning (IRL) to learn target-specific reward functions and policies for these two target goals.
arXiv Detail & Related papers (2020-01-31T15:53:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.