Gazing Into Missteps: Leveraging Eye-Gaze for Unsupervised Mistake Detection in Egocentric Videos of Skilled Human Activities
- URL: http://arxiv.org/abs/2406.08379v4
- Date: Mon, 25 Nov 2024 09:19:38 GMT
- Title: Gazing Into Missteps: Leveraging Eye-Gaze for Unsupervised Mistake Detection in Egocentric Videos of Skilled Human Activities
- Authors: Michele Mazzamuto, Antonino Furnari, Yoichi Sato, Giovanni Maria Farinella,
- Abstract summary: We address the challenge of unsupervised mistake detection in egocentric video through the analysis of gaze signals.
Based on the observation that eye movements closely follow object manipulation activities, we assess to what extent eye-gaze signals can support mistake detection.
Inconsistencies between predicted and observed gaze trajectories act as an indicator to identify mistakes.
- Score: 25.049754180292034
- License:
- Abstract: We address the challenge of unsupervised mistake detection in egocentric video of skilled human activities through the analysis of gaze signals. While traditional methods rely on manually labeled mistakes, our approach does not require mistake annotations, hence overcoming the need of domain-specific labeled data. Based on the observation that eye movements closely follow object manipulation activities, we assess to what extent eye-gaze signals can support mistake detection, proposing to identify deviations in attention patterns measured through a gaze tracker with respect to those estimated by a gaze prediction model. Since predicting gaze in video is characterized by high uncertainty, we propose a novel gaze completion task, where eye fixations are predicted from visual observations and partial gaze trajectories, and contribute a novel gaze completion approach which explicitly models correlations between gaze information and local visual tokens. Inconsistencies between predicted and observed gaze trajectories act as an indicator to identify mistakes. Experiments highlight the effectiveness of the proposed approach in different settings, with relative gains up to +14%, +11%, and +5% in EPIC-Tent, HoloAssist and IndustReal respectively, remarkably matching results of supervised approaches without seeing any labels. We further show that gaze-based analysis is particularly useful in the presence of skilled actions, low action execution confidence, and actions requiring hand-eye coordination and object manipulation skills. Our method is ranked first on the HoloAssist Mistake Detection challenge.
Related papers
- Learning Unsupervised Gaze Representation via Eye Mask Driven Information Bottleneck [36.255590251433844]
This work proposes a novel unsupervised/self-supervised gaze pre-training framework.
It forces the full-face branch to learn a low dimensional gaze embedding without gaze annotations, through collaborative feature contrast and squeeze modules.
In the heart of this framework is an alternating eye-attended/unattended masking training scheme, which squeezes gaze-related information from full-face branch into an eye-masked auto-encoder.
arXiv Detail & Related papers (2024-06-29T04:35:08Z) - OOSTraj: Out-of-Sight Trajectory Prediction With Vision-Positioning Denoising [49.86409475232849]
Trajectory prediction is fundamental in computer vision and autonomous driving.
Existing approaches in this field often assume precise and complete observational data.
We present a novel method for out-of-sight trajectory prediction that leverages a vision-positioning technique.
arXiv Detail & Related papers (2024-04-02T18:30:29Z) - LatentGaze: Cross-Domain Gaze Estimation through Gaze-Aware Analytic
Latent Code Manipulation [0.0]
We propose a gaze-aware analytic manipulation method, based on a data-driven approach with generative adversarial network inversion's disentanglement characteristics.
By utilizing GAN-based encoder-generator process, we shift the input image from the target domain to the source domain image, which a gaze estimator is sufficiently aware.
arXiv Detail & Related papers (2022-09-21T08:05:53Z) - Active Gaze Control for Foveal Scene Exploration [124.11737060344052]
We propose a methodology to emulate how humans and robots with foveal cameras would explore a scene.
The proposed method achieves an increase in detection F1-score of 2-3 percentage points for the same number of gaze shifts.
arXiv Detail & Related papers (2022-08-24T14:59:28Z) - End-to-End Human-Gaze-Target Detection with Transformers [57.00864538284686]
We propose an effective and efficient method for Human-Gaze-Target (HGT) detection, i.e., gaze following.
Our method, named Human-Gaze-Target detection TRansformer or HGTTR, streamlines the HGT detection pipeline by eliminating all other components.
The effectiveness and robustness of our proposed method are verified with extensive experiments on the two standard benchmark datasets, GazeFollowing and VideoAttentionTarget.
arXiv Detail & Related papers (2022-03-20T02:37:06Z) - Counterfactual Attention Learning for Fine-Grained Visual Categorization
and Re-identification [101.49122450005869]
We present a counterfactual attention learning method to learn more effective attention based on causal inference.
Specifically, we analyze the effect of the learned visual attention on network prediction.
We evaluate our method on a wide range of fine-grained recognition tasks.
arXiv Detail & Related papers (2021-08-19T14:53:40Z) - Bayesian Eye Tracking [63.21413628808946]
Model-based eye tracking is susceptible to eye feature detection errors.
We propose a Bayesian framework for model-based eye tracking.
Compared to state-of-the-art model-based and learning-based methods, the proposed framework demonstrates significant improvement in generalization capability.
arXiv Detail & Related papers (2021-06-25T02:08:03Z) - Weakly-Supervised Physically Unconstrained Gaze Estimation [80.66438763587904]
We tackle the previously unexplored problem of weakly-supervised gaze estimation from videos of human interactions.
We propose a training algorithm along with several novel loss functions especially designed for the task.
We show significant improvements in (a) the accuracy of semi-supervised gaze estimation and (b) cross-domain generalization on the state-of-the-art physically unconstrained in-the-wild Gaze360 gaze estimation benchmark.
arXiv Detail & Related papers (2021-05-20T14:58:52Z) - Integrating Human Gaze into Attention for Egocentric Activity
Recognition [40.517438760096056]
We introduce an effective probabilistic approach to integrate human gaze intotemporal attention for egocentric activity recognition.
We represent the locations gaze fixation points as structured discrete latent variables to model their uncertainties.
The predicted gaze locations are used to provide informative attentional cues to improve the recognition performance.
arXiv Detail & Related papers (2020-11-08T08:02:30Z) - MLGaze: Machine Learning-Based Analysis of Gaze Error Patterns in
Consumer Eye Tracking Systems [0.0]
In this study, gaze error patterns produced by a commercial eye tracking device were studied with the help of machine learning algorithms.
It was seen that while the impact of the different error sources on gaze data characteristics were nearly impossible to distinguish by visual inspection or from data statistics, machine learning models were successful in identifying the impact of the different error sources and predicting the variability in gaze error levels due to these conditions.
arXiv Detail & Related papers (2020-05-07T23:07:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.