Related papers: Distinguishing Target and Non-Target Fixations with EEG and Eye Tracking in Realistic Visual Scenes

Distinguishing Target and Non-Target Fixations with EEG and Eye Tracking in Realistic Visual Scenes

URL: http://arxiv.org/abs/2508.01853v1
Date: Sun, 03 Aug 2025 17:10:52 GMT
Title: Distinguishing Target and Non-Target Fixations with EEG and Eye Tracking in Realistic Visual Scenes
Authors: Mansi Sharma, Camilo Andrés Martínez Martínez, Benedikt Emanuel Wirth, Antonio Krüger, Philipp Müller,
Abstract summary: We investigate the classification of target vs. non-target fixations during free visual search in realistic scenes.<n>Our approach based on gaze and EEG features outperforms the previous state-of-the-art approach.
Score: 20.53761110476627
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Distinguishing target from non-target fixations during visual search is a fundamental building block to understand users' intended actions and to build effective assistance systems. While prior research indicated the feasibility of classifying target vs. non-target fixations based on eye tracking and electroencephalography (EEG) data, these studies were conducted with explicitly instructed search trajectories, abstract visual stimuli, and disregarded any scene context. This is in stark contrast with the fact that human visual search is largely driven by scene characteristics and raises questions regarding generalizability to more realistic scenarios. To close this gap, we, for the first time, investigate the classification of target vs. non-target fixations during free visual search in realistic scenes. In particular, we conducted a 36-participants user study using a large variety of 140 realistic visual search scenes in two highly relevant application scenarios: searching for icons on desktop backgrounds and finding tools in a cluttered workshop. Our approach based on gaze and EEG features outperforms the previous state-of-the-art approach based on a combination of fixation duration and saccade-related potentials. We perform extensive evaluations to assess the generalizability of our approach across scene types. Our approach significantly advances the ability to distinguish between target and non-target fixations in realistic scenarios, achieving 83.6% accuracy in cross-user evaluations. This substantially outperforms previous methods based on saccade-related potentials, which reached only 56.9% accuracy.

Related papers

Implicit Search Intent Recognition using EEG and Eye Tracking: Novel Dataset and Cross-User Prediction [21.59167760456658]
We present the first method for cross-user prediction of search intents from EEG and eye-tracking recordings.<n>We reach 84.5% accuracy in leave-one-user-out evaluations.
arXiv Detail & Related papers (2025-08-03T17:27:32Z)
Human Scanpath Prediction in Target-Present Visual Search with Semantic-Foveal Bayesian Attention [49.99728312519117]
SemBA-FAST is a top-down framework designed for predicting human visual attention in target-present visual search.<n>We evaluate SemBA-FAST on the COCO-Search18 benchmark dataset, comparing its performance against other scanpath prediction models.<n>These findings provide valuable insights into the capabilities of semantic-foveal probabilistic frameworks for human-like attention modelling.
arXiv Detail & Related papers (2025-07-24T15:19:23Z)
Towards Pixel-Level Prediction for Gaze Following: Benchmark and Approach [27.84672974344777]
We propose a novel gaze target prediction solution named GazeSeg.<n>It can fully utilize the spatial visual field of the person as guiding information and lead to a progressively coarse-to-fine gaze target segmentation and recognition process.<n>Our approach achieves the Dice of 0.325 in gaze target segmentation and 71.7% top-5 recognition.
arXiv Detail & Related papers (2024-11-30T01:27:48Z)
Zero-Shot Object-Centric Representation Learning [72.43369950684057]
We study current object-centric methods through the lens of zero-shot generalization. We introduce a benchmark comprising eight different synthetic and real-world datasets. We find that training on diverse real-world images improves transferability to unseen scenarios.
arXiv Detail & Related papers (2024-08-17T10:37:07Z)
Semantic-Based Active Perception for Humanoid Visual Tasks with Foveal Sensors [49.99728312519117]
The aim of this work is to establish how accurately a recent semantic-based active perception model is able to complete visual tasks that are regularly performed by humans. This model exploits the ability of current object detectors to localize and classify a large number of object classes and to update a semantic description of a scene across multiple fixations. In the task of scene exploration, the semantic-based method demonstrates superior performance compared to the traditional saliency-based model.
arXiv Detail & Related papers (2024-04-16T18:15:57Z)
Less is More: Toward Zero-Shot Local Scene Graph Generation via Foundation Models [16.08214739525615]
We present a new task called Local Scene Graph Generation. It aims to abstract pertinent structural information with partial objects and their relationships in an image. We introduce zEro-shot Local scEne GrAph geNeraTion (ELEGANT), a framework harnessing foundation models renowned for their powerful perception and commonsense reasoning.
arXiv Detail & Related papers (2023-10-02T17:19:04Z)
Object Manipulation via Visual Target Localization [64.05939029132394]
Training agents to manipulate objects, poses many challenges. We propose an approach that explores the environment in search for target objects, computes their 3D coordinates once they are located, and then continues to estimate their 3D locations even when the objects are not visible. Our evaluations show a massive 3x improvement in success rate over a model that has access to the same sensory suite.
arXiv Detail & Related papers (2022-03-15T17:59:01Z)
One-Shot Object Affordance Detection in the Wild [76.46484684007706]
Affordance detection refers to identifying the potential action possibilities of objects in an image. We devise a One-Shot Affordance Detection Network (OSAD-Net) that estimates the human action purpose and then transfers it to help detect the common affordance from all candidate images. With complex scenes and rich annotations, our PADv2 dataset can be used as a test bed to benchmark affordance detection methods.
arXiv Detail & Related papers (2021-08-08T14:53:10Z)
Deep Learning for Scene Classification: A Survey [48.57123373347695]
Scene classification is a longstanding, fundamental and challenging problem in computer vision. The rise of large-scale datasets and the renaissance of deep learning techniques have brought remarkable progress in the field of scene representation and classification. This paper provides a comprehensive survey of recent achievements in scene classification using deep learning.
arXiv Detail & Related papers (2021-01-26T03:06:50Z)
Embodied Visual Active Learning for Semantic Segmentation [33.02424587900808]
We study the task of embodied visual active learning, where an agent is set to explore a 3d environment with the goal to acquire visual scene understanding. We develop a battery of agents - both learnt and pre-specified - and with different levels of knowledge of the environment. We extensively evaluate the proposed models using the Matterport3D simulator and show that a fully learnt method outperforms comparable pre-specified counterparts.
arXiv Detail & Related papers (2020-12-17T11:02:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.