Related papers: EyeSeg: An Uncertainty-Aware Eye Segmentation Framework for AR/VR

EyeSeg: An Uncertainty-Aware Eye Segmentation Framework for AR/VR

URL: http://arxiv.org/abs/2507.09649v1
Date: Sun, 13 Jul 2025 14:33:10 GMT
Title: EyeSeg: An Uncertainty-Aware Eye Segmentation Framework for AR/VR
Authors: Zhengyuan Peng, Jianqing Xu, Shen Li, Jiazhen Ji, Yuge Huang, Jingyun Zhang, Jinmin Li, Shouhong Ding, Rizen Guo, Xin Tan, Lizhuang Ma,
Abstract summary: EyeSeg is an uncertainty-aware eye segmentation framework for augmented reality (AR) and virtual reality (VR)<n>We show that EyeSeg achieves segmentation improvements of MIoU, E1, F1, and ACC surpassing previous approaches.
Score: 58.33693755009173
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Human-machine interaction through augmented reality (AR) and virtual reality (VR) is increasingly prevalent, requiring accurate and efficient gaze estimation which hinges on the accuracy of eye segmentation to enable smooth user experiences. We introduce EyeSeg, a novel eye segmentation framework designed to overcome key challenges that existing approaches struggle with: motion blur, eyelid occlusion, and train-test domain gaps. In these situations, existing models struggle to extract robust features, leading to suboptimal performance. Noting that these challenges can be generally quantified by uncertainty, we design EyeSeg as an uncertainty-aware eye segmentation framework for AR/VR wherein we explicitly model the uncertainties by performing Bayesian uncertainty learning of a posterior under the closed set prior. Theoretically, we prove that a statistic of the learned posterior indicates segmentation uncertainty levels and empirically outperforms existing methods in downstream tasks, such as gaze estimation. EyeSeg outputs an uncertainty score and the segmentation result, weighting and fusing multiple gaze estimates for robustness, which proves to be effective especially under motion blur, eyelid occlusion and cross-domain challenges. Moreover, empirical results suggest that EyeSeg achieves segmentation improvements of MIoU, E1, F1, and ACC surpassing previous approaches. The code is publicly available at https://github.com/JethroPeng/EyeSeg.

Related papers

Trade-offs in Privacy-Preserving Eye Tracking through Iris Obfuscation: A Benchmarking Study [44.44776028287441]
We benchmark blurring, noising, downsampling, rubber sheet model, and iris style transfer to obfuscate user identity.<n>Our experiments show that canonical image processing methods like blurring and noising cause a marginal impact on deep learning-based tasks.<n>While downsampling, rubber sheet model, and iris style transfer are effective in hiding user identifiers, iris style transfer, with higher computation cost, outperforms others in both utility tasks.
arXiv Detail & Related papers (2025-04-14T14:29:38Z)
Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing [84.16442052968615]
We introduce RISEBench, the first benchmark for evaluating Reasoning-Informed viSual Editing (RISE)<n>RISEBench focuses on four key reasoning categories: Temporal, Causal, Spatial, and Logical Reasoning.<n>We conduct experiments evaluating nine prominent visual editing models, comprising both open-source and proprietary models.
arXiv Detail & Related papers (2025-04-03T17:59:56Z)
Rethinking Edge Detection through Perceptual Asymmetry: The SWBCE Loss [0.0]
We propose the Symmetrization Weighted Binary Cross-Entropy (SWBCE) loss function.<n>By balancing label-guided and prediction-guided learning, SWBCE maintains high edge recall while effectively suppressing false positives.<n>These findings underscore the effectiveness of SWBCE for high-quality edge prediction and its potential applicability to related vision tasks.
arXiv Detail & Related papers (2025-01-23T04:10:31Z)
Understanding and Improving Training-Free AI-Generated Image Detections with Vision Foundation Models [68.90917438865078]
Deepfake techniques for facial synthesis and editing pose serious risks for generative models.<n>In this paper, we investigate how detection performance varies across model backbones, types, and datasets.<n>We introduce Contrastive Blur, which enhances performance on facial images, and MINDER, which addresses noise type bias, balancing performance across domains.
arXiv Detail & Related papers (2024-11-28T13:04:45Z)
Predictive Uncertainty Quantification for Bird's Eye View Segmentation: A Benchmark and Novel Loss Function [10.193504550494486]
This paper introduces a benchmark for predictive uncertainty quantification in Bird's Eye View (BEV) segmentation.<n>Our study focuses on the effectiveness of quantified uncertainty in detecting misclassified and out-of-distribution pixels.<n>We propose a novel loss function, Uncertainty-Focal-Cross-Entropy (UFCE), specifically designed for highly imbalanced data.
arXiv Detail & Related papers (2024-05-31T16:32:46Z)
Neighbor-Aware Calibration of Segmentation Networks with Penalty-Based Constraints [19.897181782914437]
We propose a principled and simple solution based on equality constraints on the logit values, which enables to control explicitly both the enforced constraint and the weight of the penalty. Our approach can be used to train a wide span of deep segmentation networks.
arXiv Detail & Related papers (2024-01-25T19:46:57Z)
SCL-VI: Self-supervised Context Learning for Visual Inspection of Industrial Defects [4.487908181569429]
We present a novel self-supervised learning algorithm designed to derive an optimal encoder by tackling the renowned jigsaw puzzle. Our approach involves dividing the target image into nine patches, tasking the encoder with predicting the relative position relationships between any two patches to extract rich semantics.
arXiv Detail & Related papers (2023-11-11T08:01:40Z)
Improving Vision Anomaly Detection with the Guidance of Language Modality [64.53005837237754]
This paper tackles the challenges for vision modality from a multimodal point of view. We propose Cross-modal Guidance (CMG) to tackle the redundant information issue and sparse space issue. To learn a more compact latent space for the vision anomaly detector, CMLE learns a correlation structure matrix from the language modality.
arXiv Detail & Related papers (2023-10-04T13:44:56Z)
Bayesian Eye Tracking [63.21413628808946]
Model-based eye tracking is susceptible to eye feature detection errors. We propose a Bayesian framework for model-based eye tracking. Compared to state-of-the-art model-based and learning-based methods, the proposed framework demonstrates significant improvement in generalization capability.
arXiv Detail & Related papers (2021-06-25T02:08:03Z)
Exploring Robustness of Unsupervised Domain Adaptation in Semantic Segmentation [74.05906222376608]
We propose adversarial self-supervision UDA (or ASSUDA) that maximizes the agreement between clean images and their adversarial examples by a contrastive loss in the output space. This paper is rooted in two observations: (i) the robustness of UDA methods in semantic segmentation remains unexplored, which pose a security concern in this field; and (ii) although commonly used self-supervision (e.g., rotation and jigsaw) benefits image tasks such as classification and recognition, they fail to provide the critical supervision signals that could learn discriminative representation for segmentation tasks.
arXiv Detail & Related papers (2021-05-23T01:50:44Z)
Towards End-to-end Video-based Eye-Tracking [50.0630362419371]
Estimating eye-gaze from images alone is a challenging task due to un-observable person-specific factors. We propose a novel dataset and accompanying method which aims to explicitly learn these semantic and temporal relationships. We demonstrate that the fusion of information from visual stimuli as well as eye images can lead towards achieving performance similar to literature-reported figures.
arXiv Detail & Related papers (2020-07-26T12:39:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.