LEyes: A Lightweight Framework for Deep Learning-Based Eye Tracking
using Synthetic Eye Images
- URL: http://arxiv.org/abs/2309.06129v3
- Date: Thu, 28 Sep 2023 08:55:58 GMT
- Title: LEyes: A Lightweight Framework for Deep Learning-Based Eye Tracking
using Synthetic Eye Images
- Authors: Sean Anthony Byrne, Virmarie Maquiling, Marcus Nystr\"om, Enkelejda
Kasneci, Diederick C. Niehorster
- Abstract summary: We present a framework called Light Eyes or "LEyes" which, unlike conventional methods, only models key image features required for video-based eye tracking.
We demonstrate that models trained using LEyes are consistently on-par or outperform other state-of-the-art algorithms in terms of pupil and CR localization.
- Score: 9.150553995510217
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Deep learning has bolstered gaze estimation techniques, but real-world
deployment has been impeded by inadequate training datasets. This problem is
exacerbated by both hardware-induced variations in eye images and inherent
biological differences across the recorded participants, leading to both
feature and pixel-level variance that hinders the generalizability of models
trained on specific datasets. While synthetic datasets can be a solution, their
creation is both time and resource-intensive. To address this problem, we
present a framework called Light Eyes or "LEyes" which, unlike conventional
photorealistic methods, only models key image features required for video-based
eye tracking using simple light distributions. LEyes facilitates easy
configuration for training neural networks across diverse gaze-estimation
tasks. We demonstrate that models trained using LEyes are consistently on-par
or outperform other state-of-the-art algorithms in terms of pupil and CR
localization across well-known datasets. In addition, a LEyes trained model
outperforms the industry standard eye tracker using significantly more
cost-effective hardware. Going forward, we are confident that LEyes will
revolutionize synthetic data generation for gaze estimation models, and lead to
significant improvements of the next generation video-based eye trackers.
Related papers
- Analysis of Classifier Training on Synthetic Data for Cross-Domain Datasets [4.696575161583618]
This study focuses on camera-based traffic sign recognition applications for advanced driver assistance systems and autonomous driving.
The proposed augmentation pipeline of synthetic datasets includes novel augmentation processes such as structured shadows and gaussian specular highlights.
Experiments showed that a synthetic image-based approach outperforms in most cases real image-based training when applied to cross-domain test datasets.
arXiv Detail & Related papers (2024-10-30T07:11:41Z) - A Single Transformer for Scalable Vision-Language Modeling [74.05173379908703]
We present SOLO, a single transformer for visiOn-Language mOdeling.
A unified single Transformer architecture, like SOLO, effectively addresses these scalability concerns in LVLMs.
In this paper, we introduce the first open-source training recipe for developing SOLO, an open-source 7B LVLM.
arXiv Detail & Related papers (2024-07-08T22:40:15Z) - BVI-RLV: A Fully Registered Dataset and Benchmarks for Low-Light Video Enhancement [56.97766265018334]
This paper introduces a low-light video dataset, consisting of 40 scenes with various motion scenarios under two distinct low-lighting conditions.
We provide fully registered ground truth data captured in normal light using a programmable motorized dolly and refine it via an image-based approach for pixel-wise frame alignment across different light levels.
Our experimental results demonstrate the significance of fully registered video pairs for low-light video enhancement (LLVE) and the comprehensive evaluation shows that the models trained with our dataset outperform those trained with the existing datasets.
arXiv Detail & Related papers (2024-07-03T22:41:49Z) - Deep Domain Adaptation: A Sim2Real Neural Approach for Improving Eye-Tracking Systems [80.62854148838359]
Eye image segmentation is a critical step in eye tracking that has great influence over the final gaze estimate.
We use dimensionality-reduction techniques to measure the overlap between the target eye images and synthetic training data.
Our methods result in robust, improved performance when tackling the discrepancy between simulation and real-world data samples.
arXiv Detail & Related papers (2024-03-23T22:32:06Z) - Bilevel Fast Scene Adaptation for Low-Light Image Enhancement [50.639332885989255]
Enhancing images in low-light scenes is a challenging but widely concerned task in the computer vision.
Main obstacle lies in the modeling conundrum from distribution discrepancy across different scenes.
We introduce the bilevel paradigm to model the above latent correspondence.
A bilevel learning framework is constructed to endow the scene-irrelevant generality of the encoder towards diverse scenes.
arXiv Detail & Related papers (2023-06-02T08:16:21Z) - Ultra-High-Definition Low-Light Image Enhancement: A Benchmark and
Transformer-Based Method [51.30748775681917]
We consider the task of low-light image enhancement (LLIE) and introduce a large-scale database consisting of images at 4K and 8K resolution.
We conduct systematic benchmarking studies and provide a comparison of current LLIE algorithms.
As a second contribution, we introduce LLFormer, a transformer-based low-light enhancement method.
arXiv Detail & Related papers (2022-12-22T09:05:07Z) - Learning optical flow from still images [53.295332513139925]
We introduce a framework to generate accurate ground-truth optical flow annotations quickly and in large amounts from any readily available single real picture.
We virtually move the camera in the reconstructed environment with known motion vectors and rotation angles.
When trained with our data, state-of-the-art optical flow networks achieve superior generalization to unseen real data.
arXiv Detail & Related papers (2021-04-08T17:59:58Z) - PennSyn2Real: Training Object Recognition Models without Human Labeling [12.923677573437699]
We propose PennSyn2Real - a synthetic dataset consisting of more than 100,000 4K images of more than 20 types of micro aerial vehicles (MAVs)
The dataset can be used to generate arbitrary numbers of training images for high-level computer vision tasks such as MAV detection and classification.
We show that synthetic data generated using this framework can be directly used to train CNN models for common object recognition tasks such as detection and segmentation.
arXiv Detail & Related papers (2020-09-22T02:53:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.