Gaze2AOI: Open Source Deep-learning Based System for Automatic Area of Interest Annotation with Eye Tracking Data
- URL: http://arxiv.org/abs/2411.13346v1
- Date: Wed, 20 Nov 2024 14:17:23 GMT
- Title: Gaze2AOI: Open Source Deep-learning Based System for Automatic Area of Interest Annotation with Eye Tracking Data
- Authors: Karolina Trajkovska, Matjaž Kljun, Klen Čopič Pucihar,
- Abstract summary: We present a novel method to enhance the analysis of user behaviour and attention by augmenting video streams with automatically annotating and labelling areas of interest (AOIs)
The tool provides key features such as time to first fixation, dwell time, and frequency of AOI revisits.
- Score: 0.0
- License:
- Abstract: Eye gaze is considered an important indicator for understanding and predicting user behaviour, as well as directing their attention across various domains including advertisement design, human-computer interaction and film viewing. In this paper, we present a novel method to enhance the analysis of user behaviour and attention by (i) augmenting video streams with automatically annotating and labelling areas of interest (AOIs), and (ii) integrating AOIs with collected eye gaze and fixation data. The tool provides key features such as time to first fixation, dwell time, and frequency of AOI revisits. By incorporating the YOLOv8 object tracking algorithm, the tool supports over 600 different object classes, providing a comprehensive set for a variety of video streams. This tool will be made available as open-source software, thereby contributing to broader research and development efforts in the field.
Related papers
- Understanding Long Videos via LLM-Powered Entity Relation Graphs [51.13422967711056]
GraphVideoAgent is a framework that maps and monitors the evolving relationships between visual entities throughout the video sequence.
Our approach demonstrates remarkable effectiveness when tested against industry benchmarks.
arXiv Detail & Related papers (2025-01-27T10:57:24Z) - VOVTrack: Exploring the Potentiality in Videos for Open-Vocabulary Object Tracking [61.56592503861093]
This issue amalgamates the complexities of open-vocabulary object detection (OVD) and multi-object tracking (MOT)
Existing approaches to OVMOT often merge OVD and MOT methodologies as separate modules, predominantly focusing on the problem through an image-centric lens.
We propose VOVTrack, a novel method that integrates object states relevant to MOT and video-centric training to address this challenge from a video object tracking standpoint.
arXiv Detail & Related papers (2024-10-11T05:01:49Z) - I-MPN: Inductive Message Passing Network for Efficient Human-in-the-Loop Annotation of Mobile Eye Tracking Data [4.487146086221174]
We present a novel human-centered learning algorithm designed for automated object recognition within mobile eye-tracking settings.
Our approach seamlessly integrates an object detector with a spatial relation-aware inductive message-passing network (I-MPN), harnessing node profile information and capturing object correlations.
arXiv Detail & Related papers (2024-06-10T13:08:31Z) - Voila-A: Aligning Vision-Language Models with User's Gaze Attention [56.755993500556734]
We introduce gaze information as a proxy for human attention to guide Vision-Language Models (VLMs)
We propose a novel approach, Voila-A, for gaze alignment to enhance the interpretability and effectiveness of these models in real-world applications.
arXiv Detail & Related papers (2023-12-22T17:34:01Z) - SeeBel: Seeing is Believing [0.9790236766474201]
We propose three visualizations that enable users to compare dataset statistics and AI performance for segmenting all images.
Our project tries to further increase the interpretability of the trained AI model for segmentation by visualizing its image attention weights.
We propose to conduct surveys on real users to study the efficacy of our visualization tool in computer vision and AI domain.
arXiv Detail & Related papers (2023-12-18T05:11:00Z) - Follow Anything: Open-set detection, tracking, and following in
real-time [89.83421771766682]
We present a robotic system to detect, track, and follow any object in real-time.
Our approach, dubbed follow anything'' (FAn), is an open-vocabulary and multimodal model.
FAn can be deployed on a laptop with a lightweight (6-8 GB) graphics card, achieving a throughput of 6-20 frames per second.
arXiv Detail & Related papers (2023-08-10T17:57:06Z) - Decoding Attention from Gaze: A Benchmark Dataset and End-to-End Models [6.642042615005632]
Eye-tracking has potential to provide rich behavioral data about human cognition in ecologically valid environments.
This paper studies using computer vision tools for "attention decoding", the task of assessing the locus of a participant's overt visual attention over time.
arXiv Detail & Related papers (2022-11-20T12:24:57Z) - Recent Advances in Embedding Methods for Multi-Object Tracking: A Survey [71.10448142010422]
Multi-object tracking (MOT) aims to associate target objects across video frames in order to obtain entire moving trajectories.
Embedding methods play an essential role in object location estimation and temporal identity association in MOT.
We first conduct a comprehensive overview with in-depth analysis for embedding methods in MOT from seven different perspectives.
arXiv Detail & Related papers (2022-05-22T06:54:33Z) - ASOD60K: Audio-Induced Salient Object Detection in Panoramic Videos [79.05486554647918]
We propose PV-SOD, a new task that aims to segment salient objects from panoramic videos.
In contrast to existing fixation-level or object-level saliency detection tasks, we focus on multi-modal salient object detection (SOD)
We collect the first large-scale dataset, named ASOD60K, which contains 4K-resolution video frames annotated with a six-level hierarchy.
arXiv Detail & Related papers (2021-07-24T15:14:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.