DiffEye: Diffusion-Based Continuous Eye-Tracking Data Generation Conditioned on Natural Images
- URL: http://arxiv.org/abs/2509.16767v2
- Date: Thu, 09 Oct 2025 12:52:06 GMT
- Title: DiffEye: Diffusion-Based Continuous Eye-Tracking Data Generation Conditioned on Natural Images
- Authors: Ozgur Kara, Harris Nisar, James M. Rehg,
- Abstract summary: DiffEye is a diffusion-based training framework designed to model continuous and diverse eye movement trajectories during free viewing of natural images.<n>By leveraging raw eye-tracking trajectories rather than relying on scanpaths, DiffEye captures the inherent variability in human gaze behavior.<n>The generated trajectories can also be converted into scanpaths and saliency maps, resulting in outputs that more accurately reflect the distribution of human visual attention.
- Score: 24.810828226931605
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Numerous models have been developed for scanpath and saliency prediction, which are typically trained on scanpaths, which model eye movement as a sequence of discrete fixation points connected by saccades, while the rich information contained in the raw trajectories is often discarded. Moreover, most existing approaches fail to capture the variability observed among human subjects viewing the same image. They generally predict a single scanpath of fixed, pre-defined length, which conflicts with the inherent diversity and stochastic nature of real-world visual attention. To address these challenges, we propose DiffEye, a diffusion-based training framework designed to model continuous and diverse eye movement trajectories during free viewing of natural images. Our method builds on a diffusion model conditioned on visual stimuli and introduces a novel component, namely Corresponding Positional Embedding (CPE), which aligns spatial gaze information with the patch-based semantic features of the visual input. By leveraging raw eye-tracking trajectories rather than relying on scanpaths, DiffEye captures the inherent variability in human gaze behavior and generates high-quality, realistic eye movement patterns, despite being trained on a comparatively small dataset. The generated trajectories can also be converted into scanpaths and saliency maps, resulting in outputs that more accurately reflect the distribution of human visual attention. DiffEye is the first method to tackle this task on natural images using a diffusion model while fully leveraging the richness of raw eye-tracking data. Our extensive evaluation shows that DiffEye not only achieves state-of-the-art performance in scanpath generation but also enables, for the first time, the generation of continuous eye movement trajectories. Project webpage: https://diff-eye.github.io/
Related papers
- Foraging with the Eyes: Dynamics in Human Visual Gaze and Deep Predictive Modeling [0.0]
Animals often forage via Levy walks with heavy tailed step lengths optimized for sparse resource environments.<n>We show that human visual gaze follows similar dynamics when images.<n>Our findings present new evidence that human visual exploration obeys statistical laws to natural foraging and open avenues for modeling gaze through generative and predictive frameworks.
arXiv Detail & Related papers (2025-10-10T11:45:51Z) - Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction [66.71402249062777]
We present ScanDiff, a novel architecture that combines diffusion models with Vision Transformers to generate diverse and realistic scanpaths.<n>Our method explicitly models scanpath variability by leveraging the nature of diffusion models, producing a wide range of plausible gaze trajectories.<n>Experiments on benchmark datasets show that ScanDiff surpasses state-of-the-art methods in both free-viewing and task-driven scenarios.
arXiv Detail & Related papers (2025-07-30T18:36:09Z) - Understanding and Improving Training-Free AI-Generated Image Detections with Vision Foundation Models [68.90917438865078]
Deepfake techniques for facial synthesis and editing pose serious risks for generative models.<n>In this paper, we investigate how detection performance varies across model backbones, types, and datasets.<n>We introduce Contrastive Blur, which enhances performance on facial images, and MINDER, which addresses noise type bias, balancing performance across domains.
arXiv Detail & Related papers (2024-11-28T13:04:45Z) - Deep Domain Adaptation: A Sim2Real Neural Approach for Improving Eye-Tracking Systems [80.62854148838359]
Eye image segmentation is a critical step in eye tracking that has great influence over the final gaze estimate.
We use dimensionality-reduction techniques to measure the overlap between the target eye images and synthetic training data.
Our methods result in robust, improved performance when tackling the discrepancy between simulation and real-world data samples.
arXiv Detail & Related papers (2024-03-23T22:32:06Z) - Unveiling the Truth: Exploring Human Gaze Patterns in Fake Images [34.02058539403381]
We leverage human semantic knowledge to investigate the possibility of being included in frameworks of fake image detection.
A preliminary statistical analysis is conducted to explore the distinctive patterns in how humans perceive genuine and altered images.
arXiv Detail & Related papers (2024-03-13T19:56:30Z) - Deceptive-NeRF/3DGS: Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction [60.52716381465063]
We introduce Deceptive-NeRF/3DGS to enhance sparse-view reconstruction with only a limited set of input images.
Specifically, we propose a deceptive diffusion model turning noisy images rendered from few-view reconstructions into high-quality pseudo-observations.
Our system progressively incorporates diffusion-generated pseudo-observations into the training image sets, ultimately densifying the sparse input observations by 5 to 10 times.
arXiv Detail & Related papers (2023-05-24T14:00:32Z) - Bayesian Eye Tracking [63.21413628808946]
Model-based eye tracking is susceptible to eye feature detection errors.
We propose a Bayesian framework for model-based eye tracking.
Compared to state-of-the-art model-based and learning-based methods, the proposed framework demonstrates significant improvement in generalization capability.
arXiv Detail & Related papers (2021-06-25T02:08:03Z) - Modeling human visual search: A combined Bayesian searcher and saliency
map approach for eye movement guidance in natural scenes [0.0]
We propose a unified Bayesian model for visual search guided by saliency maps as prior information.
We show that state-of-the-art saliency models perform well in predicting the first two fixations in a visual search task, but their performance degrades to chance afterward.
This suggests that saliency maps alone are good to model bottom-up first impressions, but are not enough to explain the scanpaths when top-down task information is critical.
arXiv Detail & Related papers (2020-09-17T15:38:23Z) - Self-Supervised Linear Motion Deblurring [112.75317069916579]
Deep convolutional neural networks are state-of-the-art for image deblurring.
We present a differentiable reblur model for self-supervised motion deblurring.
Our experiments demonstrate that self-supervised single image deblurring is really feasible.
arXiv Detail & Related papers (2020-02-10T20:15:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.