Unsupervised Gaze Prediction in Egocentric Videos by Energy-based
Surprise Modeling
- URL: http://arxiv.org/abs/2001.11580v2
- Date: Thu, 29 Apr 2021 06:15:53 GMT
- Title: Unsupervised Gaze Prediction in Egocentric Videos by Energy-based
Surprise Modeling
- Authors: Sathyanarayanan N. Aakur, Arunkumar Bagavathi
- Abstract summary: Egocentric perception has grown rapidly with the advent of immersive computing devices.
Human gaze prediction is an important problem in analyzing egocentric videos.
We quantitatively analyze the generalization capabilities of supervised, deep learning models on the egocentric gaze prediction task.
- Score: 6.294759639481189
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Egocentric perception has grown rapidly with the advent of immersive
computing devices. Human gaze prediction is an important problem in analyzing
egocentric videos and has primarily been tackled through either saliency-based
modeling or highly supervised learning. We quantitatively analyze the
generalization capabilities of supervised, deep learning models on the
egocentric gaze prediction task on unseen, out-of-domain data. We find that
their performance is highly dependent on the training data and is restricted to
the domains specified in the training annotations. In this work, we tackle the
problem of jointly predicting human gaze points and temporal segmentation of
egocentric videos without using any training data. We introduce an unsupervised
computational model that draws inspiration from cognitive psychology models of
event perception. We use Grenander's pattern theory formalism to represent
spatial-temporal features and model surprise as a mechanism to predict gaze
fixation points. Extensive evaluation on two publicly available datasets - GTEA
and GTEA+ datasets-shows that the proposed model can significantly outperform
all unsupervised baselines and some supervised gaze prediction baselines.
Finally, we show that the model can also temporally segment egocentric videos
with a performance comparable to more complex, fully supervised deep learning
baselines.
Related papers
- Enhancing LLM Reasoning via Critique Models with Test-Time and Training-Time Supervision [120.40788744292739]
We propose a two-player paradigm that separates the roles of reasoning and critique models.
We first propose AutoMathCritique, an automated and scalable framework for collecting critique data.
We demonstrate that the critique models consistently improve the actor's performance on difficult queries at test-time.
arXiv Detail & Related papers (2024-11-25T17:11:54Z) - EAMDrift: An interpretable self retrain model for time series [0.0]
We present EAMDrift, a novel method that combines forecasts from multiple individual predictors by weighting each prediction according to a performance metric.
EAMDrift is designed to automatically adapt to out-of-distribution patterns in data and identify the most appropriate models to use at each moment.
Our study on real-world datasets shows that EAMDrift outperforms individual baseline models by 20% and achieves comparable accuracy results to non-interpretable ensemble models.
arXiv Detail & Related papers (2023-05-31T13:25:26Z) - A Control-Centric Benchmark for Video Prediction [69.22614362800692]
We propose a benchmark for action-conditioned video prediction in the form of a control benchmark.
Our benchmark includes simulated environments with 11 task categories and 310 task instance definitions.
We then leverage our benchmark to study the effects of scaling model size, quantity of training data, and model ensembling.
arXiv Detail & Related papers (2023-04-26T17:59:45Z) - Predictive Experience Replay for Continual Visual Control and
Forecasting [62.06183102362871]
We present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting.
We first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting.
Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks.
arXiv Detail & Related papers (2023-03-12T05:08:03Z) - A Theoretical Study of Inductive Biases in Contrastive Learning [32.98250585760665]
We provide the first theoretical analysis of self-supervised learning that incorporates the effect of inductive biases originating from the model class.
We show that when the model has limited capacity, contrastive representations would recover certain special clustering structures that are compatible with the model architecture.
arXiv Detail & Related papers (2022-11-27T01:53:29Z) - Towards Open-World Feature Extrapolation: An Inductive Graph Learning
Approach [80.8446673089281]
We propose a new learning paradigm with graph representation and learning.
Our framework contains two modules: 1) a backbone network (e.g., feedforward neural nets) as a lower model takes features as input and outputs predicted labels; 2) a graph neural network as an upper model learns to extrapolate embeddings for new features via message passing over a feature-data graph built from observed data.
arXiv Detail & Related papers (2021-10-09T09:02:45Z) - Beyond Tracking: Using Deep Learning to Discover Novel Interactions in
Biological Swarms [3.441021278275805]
We propose training deep network models to predict system-level states directly from generic graphical features from the entire view.
Because the resulting predictive models are not based on human-understood predictors, we use explanatory modules.
This represents an example of augmented intelligence in behavioral ecology -- knowledge co-creation in a human-AI team.
arXiv Detail & Related papers (2021-08-20T22:50:41Z) - Self-Supervision by Prediction for Object Discovery in Videos [62.87145010885044]
In this paper, we use the prediction task as self-supervision and build a novel object-centric model for image sequence representation.
Our framework can be trained without the help of any manual annotation or pretrained network.
Initial experiments confirm that the proposed pipeline is a promising step towards object-centric video prediction.
arXiv Detail & Related papers (2021-03-09T19:14:33Z) - SparseBERT: Rethinking the Importance Analysis in Self-attention [107.68072039537311]
Transformer-based models are popular for natural language processing (NLP) tasks due to its powerful capacity.
Attention map visualization of a pre-trained model is one direct method for understanding self-attention mechanism.
We propose a Differentiable Attention Mask (DAM) algorithm, which can be also applied in guidance of SparseBERT design.
arXiv Detail & Related papers (2021-02-25T14:13:44Z) - Toward Improving the Evaluation of Visual Attention Models: a
Crowdsourcing Approach [21.81407627962409]
State-of-the-art models focus on learning saliency maps from human data.
We highlight the limits of the current metrics for saliency prediction and scanpath similarity.
We present a study aimed at evaluating how strongly the scanpaths generated with the unsupervised gravitational models appear plausible to naive and expert human observers.
arXiv Detail & Related papers (2020-02-11T14:27:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.