Modeling human visual search: A combined Bayesian searcher and saliency
map approach for eye movement guidance in natural scenes
- URL: http://arxiv.org/abs/2009.08373v2
- Date: Tue, 8 Dec 2020 04:02:44 GMT
- Title: Modeling human visual search: A combined Bayesian searcher and saliency
map approach for eye movement guidance in natural scenes
- Authors: M. Sclar, G. Bujia, S. Vita, G. Solovey, J. E. Kamienkowski
- Abstract summary: We propose a unified Bayesian model for visual search guided by saliency maps as prior information.
We show that state-of-the-art saliency models perform well in predicting the first two fixations in a visual search task, but their performance degrades to chance afterward.
This suggests that saliency maps alone are good to model bottom-up first impressions, but are not enough to explain the scanpaths when top-down task information is critical.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Finding objects is essential for almost any daily-life visual task. Saliency
models have been useful to predict fixation locations in natural images, but
are static, i.e., they provide no information about the time-sequence of
fixations. Nowadays, one of the biggest challenges in the field is to go beyond
saliency maps to predict a sequence of fixations related to a visual task, such
as searching for a given target. Bayesian observer models have been proposed
for this task, as they represent visual search as an active sampling process.
Nevertheless, they were mostly evaluated on artificial images, and how they
adapt to natural images remains largely unexplored.
Here, we propose a unified Bayesian model for visual search guided by
saliency maps as prior information. We validated our model with a visual search
experiment in natural scenes recording eye movements. We show that, although
state-of-the-art saliency models perform well in predicting the first two
fixations in a visual search task, their performance degrades to chance
afterward. This suggests that saliency maps alone are good to model bottom-up
first impressions, but are not enough to explain the scanpaths when top-down
task information is critical. Thus, we propose to use them as priors of
Bayesian searchers. This approach leads to a behavior very similar to humans
for the whole scanpath, both in the percentage of target found as a function of
the fixation rank and the scanpath similarity, reproducing the entire sequence
of eye movements.
Related papers
- SHIC: Shape-Image Correspondences with no Keypoint Supervision [106.99157362200867]
Canonical surface mapping generalizes keypoint detection by assigning each pixel of an object to a corresponding point in a 3D template.
Popularised by DensePose for the analysis of humans, authors have attempted to apply the concept to more categories.
We introduce SHIC, a method to learn canonical maps without manual supervision which achieves better results than supervised methods for most categories.
arXiv Detail & Related papers (2024-07-26T17:58:59Z) - Semantic-Based Active Perception for Humanoid Visual Tasks with Foveal Sensors [49.99728312519117]
The aim of this work is to establish how accurately a recent semantic-based active perception model is able to complete visual tasks that are regularly performed by humans.
This model exploits the ability of current object detectors to localize and classify a large number of object classes and to update a semantic description of a scene across multiple fixations.
In the task of scene exploration, the semantic-based method demonstrates superior performance compared to the traditional saliency-based model.
arXiv Detail & Related papers (2024-04-16T18:15:57Z) - TempSAL -- Uncovering Temporal Information for Deep Saliency Prediction [64.63645677568384]
We introduce a novel saliency prediction model that learns to output saliency maps in sequential time intervals.
Our approach locally modulates the saliency predictions by combining the learned temporal maps.
Our code will be publicly available on GitHub.
arXiv Detail & Related papers (2023-01-05T22:10:16Z) - Active Gaze Control for Foveal Scene Exploration [124.11737060344052]
We propose a methodology to emulate how humans and robots with foveal cameras would explore a scene.
The proposed method achieves an increase in detection F1-score of 2-3 percentage points for the same number of gaze shifts.
arXiv Detail & Related papers (2022-08-24T14:59:28Z) - Glimpse-Attend-and-Explore: Self-Attention for Active Visual Exploration [47.01485765231528]
Active visual exploration aims to assist an agent with a limited field of view to understand its environment based on partial observations.
We propose the Glimpse-Attend-and-Explore model which employs self-attention to guide the visual exploration instead of task-specific uncertainty maps.
Our model provides encouraging results while being less dependent on dataset bias in driving the exploration.
arXiv Detail & Related papers (2021-08-26T11:41:03Z) - SALYPATH: A Deep-Based Architecture for visual attention prediction [5.068678962285629]
Visual attention is useful for many computer vision applications such as image compression, recognition, and captioning.
We propose an end-to-end deep-based method, so-called SALYPATH, that efficiently predicts the scanpath of an image through features of a saliency model.
The idea is predict the scanpath by exploiting the capacity of a deep-based model to predict the saliency.
arXiv Detail & Related papers (2021-06-29T08:53:51Z) - Bayesian Eye Tracking [63.21413628808946]
Model-based eye tracking is susceptible to eye feature detection errors.
We propose a Bayesian framework for model-based eye tracking.
Compared to state-of-the-art model-based and learning-based methods, the proposed framework demonstrates significant improvement in generalization capability.
arXiv Detail & Related papers (2021-06-25T02:08:03Z) - Classifying Eye-Tracking Data Using Saliency Maps [8.524684315458245]
This paper proposes a visual saliency based novel feature extraction method for automatic and quantitative classification of eye-tracking data.
Comparing the saliency amplitudes, similarity and dissimilarity of saliency maps with the corresponding eye fixations maps gives an extra dimension of information which is effectively utilized to generate discriminative features to classify the eye-tracking data.
arXiv Detail & Related papers (2020-10-24T15:18:07Z) - Latent World Models For Intrinsically Motivated Exploration [140.21871701134626]
We present a self-supervised representation learning method for image-based observations.
We consider episodic and life-long uncertainties to guide the exploration of partially observable environments.
arXiv Detail & Related papers (2020-10-05T19:47:04Z) - Predicting Goal-directed Human Attention Using Inverse Reinforcement
Learning [44.774961463015245]
We propose the first inverse reinforcement learning model to learn the internal reward function and policy used by humans during visual search.
To train and evaluate our IRL model we created COCO-Search18, which is now the largest dataset of high-quality search fixations in existence.
arXiv Detail & Related papers (2020-05-28T21:46:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.