Towards End-to-end Video-based Eye-Tracking
- URL: http://arxiv.org/abs/2007.13120v1
- Date: Sun, 26 Jul 2020 12:39:15 GMT
- Title: Towards End-to-end Video-based Eye-Tracking
- Authors: Seonwook Park and Emre Aksan and Xucong Zhang and Otmar Hilliges
- Abstract summary: Estimating eye-gaze from images alone is a challenging task due to un-observable person-specific factors.
We propose a novel dataset and accompanying method which aims to explicitly learn these semantic and temporal relationships.
We demonstrate that the fusion of information from visual stimuli as well as eye images can lead towards achieving performance similar to literature-reported figures.
- Score: 50.0630362419371
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Estimating eye-gaze from images alone is a challenging task, in large parts
due to un-observable person-specific factors. Achieving high accuracy typically
requires labeled data from test users which may not be attainable in real
applications. We observe that there exists a strong relationship between what
users are looking at and the appearance of the user's eyes. In response to this
understanding, we propose a novel dataset and accompanying method which aims to
explicitly learn these semantic and temporal relationships. Our video dataset
consists of time-synchronized screen recordings, user-facing camera views, and
eye gaze data, which allows for new benchmarks in temporal gaze tracking as
well as label-free refinement of gaze. Importantly, we demonstrate that the
fusion of information from visual stimuli as well as eye images can lead
towards achieving performance similar to literature-reported figures acquired
through supervised personalization. Our final method yields significant
performance improvements on our proposed EVE dataset, with up to a 28 percent
improvement in Point-of-Gaze estimates (resulting in 2.49 degrees in angular
error), paving the path towards high-accuracy screen-based eye tracking purely
from webcam sensors. The dataset and reference source code are available at
https://ait.ethz.ch/projects/2020/EVE
Related papers
- Panonut360: A Head and Eye Tracking Dataset for Panoramic Video [0.0]
We present a head and eye tracking dataset involving 50 users watching 15 panoramic videos.
The dataset provides details on the viewport and gaze attention locations of users.
Our analysis reveals a consistent downward offset in gaze fixations relative to the Field of View.
arXiv Detail & Related papers (2024-03-26T13:54:52Z) - Deep Domain Adaptation: A Sim2Real Neural Approach for Improving Eye-Tracking Systems [80.62854148838359]
Eye image segmentation is a critical step in eye tracking that has great influence over the final gaze estimate.
We use dimensionality-reduction techniques to measure the overlap between the target eye images and synthetic training data.
Our methods result in robust, improved performance when tackling the discrepancy between simulation and real-world data samples.
arXiv Detail & Related papers (2024-03-23T22:32:06Z) - Unified Visual Relationship Detection with Vision and Language Models [89.77838890788638]
This work focuses on training a single visual relationship detector predicting over the union of label spaces from multiple datasets.
We propose UniVRD, a novel bottom-up method for Unified Visual Relationship Detection by leveraging vision and language models.
Empirical results on both human-object interaction detection and scene-graph generation demonstrate the competitive performance of our model.
arXiv Detail & Related papers (2023-03-16T00:06:28Z) - Predicting Eye Gaze Location on Websites [4.8633100732964705]
We propose an effective deep learning-based model that leverages both image and text spatial location.
We show the benefit of careful fine-tuning using our unified dataset to improve the accuracy of eye-gaze predictions.
arXiv Detail & Related papers (2022-11-15T11:55:46Z) - Gaze Estimation with Eye Region Segmentation and Self-Supervised
Multistream Learning [8.422257363944295]
We present a novel multistream network that learns robust eye representations for gaze estimation.
We first create a synthetic dataset containing eye region masks detailing the visible eyeball and iris using a simulator.
We then perform eye region segmentation with a U-Net type model which we later use to generate eye region masks for real-world images.
arXiv Detail & Related papers (2021-12-15T04:44:45Z) - HighlightMe: Detecting Highlights from Human-Centric Videos [52.84233165201391]
We present a domain- and user-preference-agnostic approach to detect highlightable excerpts from human-centric videos.
We use an autoencoder network equipped with spatial-temporal graph convolutions to detect human activities and interactions.
We observe a 4-12% improvement in the mean average precision of matching the human-annotated highlights over state-of-the-art methods.
arXiv Detail & Related papers (2021-10-05T01:18:15Z) - Bayesian Eye Tracking [63.21413628808946]
Model-based eye tracking is susceptible to eye feature detection errors.
We propose a Bayesian framework for model-based eye tracking.
Compared to state-of-the-art model-based and learning-based methods, the proposed framework demonstrates significant improvement in generalization capability.
arXiv Detail & Related papers (2021-06-25T02:08:03Z) - Visual Distant Supervision for Scene Graph Generation [66.10579690929623]
Scene graph models usually require supervised learning on large quantities of labeled data with intensive human annotation.
We propose visual distant supervision, a novel paradigm of visual relation learning, which can train scene graph models without any human-labeled data.
Comprehensive experimental results show that our distantly supervised model outperforms strong weakly supervised and semi-supervised baselines.
arXiv Detail & Related papers (2021-03-29T06:35:24Z) - In the Eye of the Beholder: Gaze and Actions in First Person Video [30.54510882243602]
We address the task of jointly determining what a person is doing and where they are looking based on the analysis of video captured by a headworn camera.
Our dataset comes with videos, gaze tracking data, hand masks and action annotations.
We propose a novel deep model for joint gaze estimation and action recognition in First Person Vision.
arXiv Detail & Related papers (2020-05-31T22:06:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.