Related papers: WEBEYETRACK: Scalable Eye-Tracking for the Browser via On-Device Few-Shot Personalization

WEBEYETRACK: Scalable Eye-Tracking for the Browser via On-Device Few-Shot Personalization

URL: http://arxiv.org/abs/2508.19544v1
Date: Wed, 27 Aug 2025 03:38:58 GMT
Title: WEBEYETRACK: Scalable Eye-Tracking for the Browser via On-Device Few-Shot Personalization
Authors: Eduardo Davalos, Yike Zhang, Namrata Srivastava, Yashvitha Thatigotla, Jorge A. Salas, Sara McFadden, Sun-Joo Cho, Amanda Goodwin, Ashwin TS, Gautam Biswas,
Abstract summary: We bEyeTrack is a framework that integrates lightweight SOTA gaze estimation models directly in the browser.<n>It incorporates model-based head pose estimation and on-device few-shot learning with as few as nine calibration samples.<n>WebEyeTrack adapts to new users, achieving SOTA performance with an error margin of 2.32 cm on GazeCapture and real-time inference speeds of 2.4 milliseconds on an iPhone 14.
Score: 3.884714448890651
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With advancements in AI, new gaze estimation methods are exceeding state-of-the-art (SOTA) benchmarks, but their real-world application reveals a gap with commercial eye-tracking solutions. Factors like model size, inference time, and privacy often go unaddressed. Meanwhile, webcam-based eye-tracking methods lack sufficient accuracy, in particular due to head movement. To tackle these issues, we introduce We bEyeTrack, a framework that integrates lightweight SOTA gaze estimation models directly in the browser. It incorporates model-based head pose estimation and on-device few-shot learning with as few as nine calibration samples (k < 9). WebEyeTrack adapts to new users, achieving SOTA performance with an error margin of 2.32 cm on GazeCapture and real-time inference speeds of 2.4 milliseconds on an iPhone 14. Our open-source code is available at https://github.com/RedForestAi/WebEyeTrack.

Related papers

EyeTheia: A Lightweight and Accessible Eye-Tracking Toolbox [0.0]
EyeTheia is a lightweight and open deep learning pipeline for webcam-based gaze estimation.<n>It enables real-time gaze tracking using only a standard laptop webcam.<n>It combines MediaPipe-based landmark extraction with a convolutional neural network inspired by iTracker and optional user-specific fine-tuning.
arXiv Detail & Related papers (2026-01-09T19:49:01Z)
EETnet: a CNN for Gaze Detection and Tracking for Smart-Eyewear [9.390741823084372]
We present EETnet, a convolutional neural network designed for eye tracking using purely event-based data.<n>EETnet is capable of running on microcontrollers with limited resources.
arXiv Detail & Related papers (2025-11-06T19:56:27Z)
HopTrack: A Real-time Multi-Object Tracking System for Embedded Devices [11.615446679072932]
This paper introduces HopTrack, a real-time multi-object tracking system tailored for embedded devices. Compared with the best high-end GPU modified baseline Byte (Embed), HopTrack achieves a processing speed of up to 39.29 on NVIDIA AGX Xavier.
arXiv Detail & Related papers (2024-11-01T14:13:53Z)
FACET: Fast and Accurate Event-Based Eye Tracking Using Ellipse Modeling for Extended Reality [14.120171971211777]
Event cameras offer a promising alternative due to their high temporal resolution and low power consumption. We present FACET (Fast and Accurate Event-based Eye Tracking), an end-to-end neural network that directly outputs pupil ellipse parameters from event data. On the enhanced EV-Eye test set, FACET achieves an average pupil center error of 0.20 pixels and an inference time of 0.53 ms.
arXiv Detail & Related papers (2024-09-23T22:31:38Z)
Kalib: Easy Hand-Eye Calibration with Reference Point Tracking [52.4190876409222]
Kalib is an automatic hand-eye calibration method that leverages the generalizability of visual foundation models to overcome challenges.<n>During calibration, a kinematic reference point is tracked in the camera coordinate 3D coordinates in the space behind the robot.<n>Kalib's user-friendly design and minimal setup requirements make it a possible solution for continuous operation in unstructured environments.
arXiv Detail & Related papers (2024-08-20T06:03:40Z)
BEVTrack: A Simple and Strong Baseline for 3D Single Object Tracking in Bird's-Eye View [56.77287041917277]
3D Single Object Tracking (SOT) is a fundamental task of computer vision, proving essential for applications like autonomous driving. In this paper, we propose BEVTrack, a simple yet effective baseline method. By estimating the target motion in Bird's-Eye View (BEV) to perform tracking, BEVTrack demonstrates surprising simplicity from various aspects, i.e., network designs, training objectives, and tracking pipeline, while achieving superior performance.
arXiv Detail & Related papers (2023-09-05T12:42:26Z)
SqueezerFaceNet: Reducing a Small Face Recognition CNN Even More Via Filter Pruning [55.84746218227712]
We develop SqueezerFaceNet, a light face recognition network which less than 1M parameters. We show that it can be further reduced (up to 40%) without an appreciable loss in performance.
arXiv Detail & Related papers (2023-07-20T08:38:50Z)
One Eye is All You Need: Lightweight Ensembles for Gaze Estimation with Single Encoders [0.0]
We propose a gaze estimation model that implements the ResNet and Inception model architectures and makes predictions using only one eye image. With the use of lightweight architectures, we achieve high performance on the GazeCapture dataset with very low model parameter counts. We also notice significantly lower errors on the right eye images in the test set, which could be important in the design of future gaze estimation-based tools.
arXiv Detail & Related papers (2022-11-22T01:12:31Z)
Workshop on Autonomous Driving at CVPR 2021: Technical Report for Streaming Perception Challenge [57.647371468876116]
We introduce our real-time 2D object detection system for the realistic autonomous driving scenario. Our detector is built on a newly designed YOLO model, called YOLOX. On the Argoverse-HD dataset, our system achieves 41.0 streaming AP, which surpassed second place by 7.8/6.1 on detection-only track/fully track, respectively.
arXiv Detail & Related papers (2021-07-27T06:36:06Z)
2nd Place Solution for Waymo Open Dataset Challenge - Real-time 2D Object Detection [26.086623067939605]
In this report, we introduce a real-time method to detect the 2D objects from images. We leverage accelerationRT to optimize the inference time of our detection pipeline. Our framework achieves the latency of 45.8ms/frame on an Nvidia Tesla V100 GPU.
arXiv Detail & Related papers (2021-06-16T11:32:03Z)
Towards End-to-end Video-based Eye-Tracking [50.0630362419371]
Estimating eye-gaze from images alone is a challenging task due to un-observable person-specific factors. We propose a novel dataset and accompanying method which aims to explicitly learn these semantic and temporal relationships. We demonstrate that the fusion of information from visual stimuli as well as eye images can lead towards achieving performance similar to literature-reported figures.
arXiv Detail & Related papers (2020-07-26T12:39:15Z)
SqueezeFacePoseNet: Lightweight Face Verification Across Different Poses for Mobile Platforms [44.78440647722169]
Face verification technologies can provide reliable and robust user authentication, given the availability of cameras in mobile devices.<n>Deep Convolutional Neural Networks have resulted in many accurate face verification architectures, but their typical size (hundreds of megabytes) makes them infeasible to be incorporated in downloadable mobile applications.<n>We develop a lightweight face recognition network of just a few megabytes that can operate with sufficient accuracy in comparison to much larger models.
arXiv Detail & Related papers (2020-07-16T19:02:38Z)
Event Based, Near Eye Gaze Tracking Beyond 10,000Hz [41.23347304960948]
We propose a hybrid frame-event-based near-eye gaze tracking system with update rates beyond 10,000 Hz. Our system builds on emerging event cameras that simultaneously acquire regularly sampled frames and adaptively sampled events. We hope to enable a new generation of ultra-low-latency gaze-contingent rendering and display techniques for virtual and augmented reality.
arXiv Detail & Related papers (2020-04-07T17:57:18Z)
Tracking Objects as Points [83.9217787335878]
We present a simultaneous detection and tracking algorithm that is simpler, faster, and more accurate than the state of the art. Our tracker, CenterTrack, applies a detection model to a pair of images and detections from the prior frame. CenterTrack is simple, online (no peeking into the future), and real-time.
arXiv Detail & Related papers (2020-04-02T17:58:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.