Integrating Human Gaze into Attention for Egocentric Activity
Recognition
- URL: http://arxiv.org/abs/2011.03920v1
- Date: Sun, 8 Nov 2020 08:02:30 GMT
- Title: Integrating Human Gaze into Attention for Egocentric Activity
Recognition
- Authors: Kyle Min, Jason J. Corso
- Abstract summary: We introduce an effective probabilistic approach to integrate human gaze intotemporal attention for egocentric activity recognition.
We represent the locations gaze fixation points as structured discrete latent variables to model their uncertainties.
The predicted gaze locations are used to provide informative attentional cues to improve the recognition performance.
- Score: 40.517438760096056
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is well known that human gaze carries significant information about visual
attention. However, there are three main difficulties in incorporating the gaze
data in an attention mechanism of deep neural networks: 1) the gaze fixation
points are likely to have measurement errors due to blinking and rapid eye
movements; 2) it is unclear when and how much the gaze data is correlated with
visual attention; and 3) gaze data is not always available in many real-world
situations. In this work, we introduce an effective probabilistic approach to
integrate human gaze into spatiotemporal attention for egocentric activity
recognition. Specifically, we represent the locations of gaze fixation points
as structured discrete latent variables to model their uncertainties. In
addition, we model the distribution of gaze fixations using a variational
method. The gaze distribution is learned during the training process so that
the ground-truth annotations of gaze locations are no longer needed in testing
situations since they are predicted from the learned gaze distribution. The
predicted gaze locations are used to provide informative attentional cues to
improve the recognition performance. Our method outperforms all the previous
state-of-the-art approaches on EGTEA, which is a large-scale dataset for
egocentric activity recognition provided with gaze measurements. We also
perform an ablation study and qualitative analysis to demonstrate that our
attention mechanism is effective.
Related papers
- Gazing Into Missteps: Leveraging Eye-Gaze for Unsupervised Mistake Detection in Egocentric Videos of Skilled Human Activities [25.049754180292034]
We address the challenge of unsupervised mistake detection in egocentric video through the analysis of gaze signals.
Based on the observation that eye movements closely follow object manipulation activities, we assess to what extent eye-gaze signals can support mistake detection.
Inconsistencies between predicted and observed gaze trajectories act as an indicator to identify mistakes.
arXiv Detail & Related papers (2024-06-12T16:29:45Z) - Bridging the Gap: Gaze Events as Interpretable Concepts to Explain Deep
Neural Sequence Models [0.7829352305480283]
In this work, we employ established gaze event detection algorithms for fixations and saccades.
We quantitatively evaluate the impact of these events by determining their concept influence.
arXiv Detail & Related papers (2023-04-12T10:15:31Z) - LatentGaze: Cross-Domain Gaze Estimation through Gaze-Aware Analytic
Latent Code Manipulation [0.0]
We propose a gaze-aware analytic manipulation method, based on a data-driven approach with generative adversarial network inversion's disentanglement characteristics.
By utilizing GAN-based encoder-generator process, we shift the input image from the target domain to the source domain image, which a gaze estimator is sufficiently aware.
arXiv Detail & Related papers (2022-09-21T08:05:53Z) - Active Gaze Control for Foveal Scene Exploration [124.11737060344052]
We propose a methodology to emulate how humans and robots with foveal cameras would explore a scene.
The proposed method achieves an increase in detection F1-score of 2-3 percentage points for the same number of gaze shifts.
arXiv Detail & Related papers (2022-08-24T14:59:28Z) - Alignment Attention by Matching Key and Query Distributions [48.93793773929006]
This paper introduces alignment attention that explicitly encourages self-attention to match the distributions of the key and query within each head.
It is simple to convert any models with self-attention, including pre-trained ones, to the proposed alignment attention.
On a variety of language understanding tasks, we show the effectiveness of our method in accuracy, uncertainty estimation, generalization across domains, and robustness to adversarial attacks.
arXiv Detail & Related papers (2021-10-25T00:54:57Z) - Counterfactual Attention Learning for Fine-Grained Visual Categorization
and Re-identification [101.49122450005869]
We present a counterfactual attention learning method to learn more effective attention based on causal inference.
Specifically, we analyze the effect of the learned visual attention on network prediction.
We evaluate our method on a wide range of fine-grained recognition tasks.
arXiv Detail & Related papers (2021-08-19T14:53:40Z) - Weakly-Supervised Physically Unconstrained Gaze Estimation [80.66438763587904]
We tackle the previously unexplored problem of weakly-supervised gaze estimation from videos of human interactions.
We propose a training algorithm along with several novel loss functions especially designed for the task.
We show significant improvements in (a) the accuracy of semi-supervised gaze estimation and (b) cross-domain generalization on the state-of-the-art physically unconstrained in-the-wild Gaze360 gaze estimation benchmark.
arXiv Detail & Related papers (2021-05-20T14:58:52Z) - PureGaze: Purifying Gaze Feature for Generalizable Gaze Estimation [12.076469954457007]
We tackle the domain generalization problem in cross-domain gaze estimation for unknown target domains.
To be specific, we realize the domain generalization by gaze feature purification.
We design a plug-and-play self-adversarial framework for the gaze feature purification.
arXiv Detail & Related papers (2021-03-24T13:22:00Z) - Wave Propagation of Visual Stimuli in Focus of Attention [77.4747032928547]
Fast reactions to changes in the surrounding visual environment require efficient attention mechanisms to reallocate computational resources to most relevant locations in the visual field.
We present a biologically-plausible model of focus of attention that exhibits effectiveness and efficiency exhibited by foveated animals.
arXiv Detail & Related papers (2020-06-19T09:33:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.