Toward Improving the Evaluation of Visual Attention Models: a
Crowdsourcing Approach
- URL: http://arxiv.org/abs/2002.04407v2
- Date: Thu, 7 May 2020 13:34:36 GMT
- Title: Toward Improving the Evaluation of Visual Attention Models: a
Crowdsourcing Approach
- Authors: Dario Zanca, Stefano Melacci, Marco Gori
- Abstract summary: State-of-the-art models focus on learning saliency maps from human data.
We highlight the limits of the current metrics for saliency prediction and scanpath similarity.
We present a study aimed at evaluating how strongly the scanpaths generated with the unsupervised gravitational models appear plausible to naive and expert human observers.
- Score: 21.81407627962409
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human visual attention is a complex phenomenon. A computational modeling of
this phenomenon must take into account where people look in order to evaluate
which are the salient locations (spatial distribution of the fixations), when
they look in those locations to understand the temporal development of the
exploration (temporal order of the fixations), and how they move from one
location to another with respect to the dynamics of the scene and the mechanics
of the eyes (dynamics). State-of-the-art models focus on learning saliency maps
from human data, a process that only takes into account the spatial component
of the phenomenon and ignore its temporal and dynamical counterparts. In this
work we focus on the evaluation methodology of models of human visual
attention. We underline the limits of the current metrics for saliency
prediction and scanpath similarity, and we introduce a statistical measure for
the evaluation of the dynamics of the simulated eye movements. While deep
learning models achieve astonishing performance in saliency prediction, our
analysis shows their limitations in capturing the dynamics of the process. We
find that unsupervised gravitational models, despite of their simplicity,
outperform all competitors. Finally, exploiting a crowd-sourcing platform, we
present a study aimed at evaluating how strongly the scanpaths generated with
the unsupervised gravitational models appear plausible to naive and expert
human observers.
Related papers
- Computing a human-like reaction time metric from stable recurrent vision
models [11.87006916768365]
We sketch a general-purpose methodology to construct computational accounts of reaction times from a stimulus-computable, task-optimized model.
We demonstrate that our metric aligns with patterns of human reaction times for stimulus manipulations across four disparate visual decision-making tasks.
This work paves the way for exploring the temporal alignment of model and human visual strategies in the context of various other cognitive tasks.
arXiv Detail & Related papers (2023-06-20T14:56:02Z) - Predictive Experience Replay for Continual Visual Control and
Forecasting [62.06183102362871]
We present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting.
We first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting.
Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks.
arXiv Detail & Related papers (2023-03-12T05:08:03Z) - TempSAL -- Uncovering Temporal Information for Deep Saliency Prediction [64.63645677568384]
We introduce a novel saliency prediction model that learns to output saliency maps in sequential time intervals.
Our approach locally modulates the saliency predictions by combining the learned temporal maps.
Our code will be publicly available on GitHub.
arXiv Detail & Related papers (2023-01-05T22:10:16Z) - Neural Superstatistics for Bayesian Estimation of Dynamic Cognitive
Models [2.7391842773173334]
We develop a simulation-based deep learning method for Bayesian inference, which can recover both time-varying and time-invariant parameters.
Our results show that the deep learning approach is very efficient in capturing the temporal dynamics of the model.
arXiv Detail & Related papers (2022-11-23T17:42:53Z) - Learning Dynamics via Graph Neural Networks for Human Pose Estimation
and Tracking [98.91894395941766]
We propose a novel online approach to learning the pose dynamics, which are independent of pose detections in current fame.
Specifically, we derive this prediction of dynamics through a graph neural network(GNN) that explicitly accounts for both spatial-temporal and visual information.
Experiments on PoseTrack 2017 and PoseTrack 2018 datasets demonstrate that the proposed method achieves results superior to the state of the art on both human pose estimation and tracking tasks.
arXiv Detail & Related papers (2021-06-07T16:36:50Z) - TRiPOD: Human Trajectory and Pose Dynamics Forecasting in the Wild [77.59069361196404]
TRiPOD is a novel method for predicting body dynamics based on graph attentional networks.
To incorporate a real-world challenge, we learn an indicator representing whether an estimated body joint is visible/invisible at each frame.
Our evaluation shows that TRiPOD outperforms all prior work and state-of-the-art specifically designed for each of the trajectory and pose forecasting tasks.
arXiv Detail & Related papers (2021-04-08T20:01:00Z) - Heteroscedastic Uncertainty for Robust Generative Latent Dynamics [7.107159120605662]
We present a method to jointly learn a latent state representation and the associated dynamics.
As our main contribution, we describe how our representation is able to capture a notion of heteroscedastic or input-specific uncertainty.
We present results from prediction and control experiments on two image-based tasks.
arXiv Detail & Related papers (2020-08-18T21:04:33Z) - Wave Propagation of Visual Stimuli in Focus of Attention [77.4747032928547]
Fast reactions to changes in the surrounding visual environment require efficient attention mechanisms to reallocate computational resources to most relevant locations in the visual field.
We present a biologically-plausible model of focus of attention that exhibits effectiveness and efficiency exhibited by foveated animals.
arXiv Detail & Related papers (2020-06-19T09:33:21Z) - A Meta-Bayesian Model of Intentional Visual Search [0.0]
We propose a computational model of visual search that incorporates Bayesian interpretations of the neural mechanisms that underlie categorical perception and saccade planning.
To enable meaningful comparisons between simulated and human behaviours, we employ a gaze-contingent paradigm that required participants to classify occluded MNIST digits through a window that followed their gaze.
Our model is able to recapitulate human behavioural metrics such as classification accuracy while retaining a high degree of interpretability, which we demonstrate by recovering subject-specific parameters from observed human behaviour.
arXiv Detail & Related papers (2020-06-05T16:10:35Z) - Visual Grounding of Learned Physical Models [66.04898704928517]
Humans intuitively recognize objects' physical properties and predict their motion, even when the objects are engaged in complicated interactions.
We present a neural model that simultaneously reasons about physics and makes future predictions based on visual and dynamics priors.
Experiments show that our model can infer the physical properties within a few observations, which allows the model to quickly adapt to unseen scenarios and make accurate predictions into the future.
arXiv Detail & Related papers (2020-04-28T17:06:38Z) - Unsupervised Gaze Prediction in Egocentric Videos by Energy-based
Surprise Modeling [6.294759639481189]
Egocentric perception has grown rapidly with the advent of immersive computing devices.
Human gaze prediction is an important problem in analyzing egocentric videos.
We quantitatively analyze the generalization capabilities of supervised, deep learning models on the egocentric gaze prediction task.
arXiv Detail & Related papers (2020-01-30T21:52:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.