VIS-iTrack: Visual Intention through Gaze Tracking using Low-Cost Webcam
- URL: http://arxiv.org/abs/2202.02587v1
- Date: Sat, 5 Feb 2022 16:00:03 GMT
- Title: VIS-iTrack: Visual Intention through Gaze Tracking using Low-Cost Webcam
- Authors: Shahed Anzarus Sabab (1, 2, 3, 4, and 5), Mohammad Ridwan Kabir (1, 2,
and 3), Sayed Rizban Hussain (1, 2, and 3), Hasan Mahmud (1, 2, and 3), Md.
Kamrul Hasan (1, 2, and 3), Husne Ara Rubaiyeat (6) ((1) Systems and Software
Lab (SSL), (2) Department of Computer Science and Engineering, (3) Islamic
University of Technology (IUT), Gazipur, Bangladesh, (4) Department of
Computer Science, (5) University of Manitoba, Winnipeg, Canada, (6) National
University, Bangladesh.)
- Abstract summary: Human intention is an internal, mental characterization for acquiring desired information.
In this work, we determine such intention by analyzing real-time eye gaze data with a low-cost regular webcam.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Human intention is an internal, mental characterization for acquiring desired
information. From interactive interfaces containing either textual or graphical
information, intention to perceive desired information is subjective and
strongly connected with eye gaze. In this work, we determine such intention by
analyzing real-time eye gaze data with a low-cost regular webcam. We extracted
unique features (e.g., Fixation Count, Eye Movement Ratio) from the eye gaze
data of 31 participants to generate a dataset containing 124 samples of visual
intention for perceiving textual or graphical information, labeled as either
TEXT or IMAGE, having 48.39% and 51.61% distribution, respectively. Using this
dataset, we analyzed 5 classifiers, including Support Vector Machine (SVM)
(Accuracy: 92.19%). Using the trained SVM, we investigated the variation of
visual intention among 30 participants, distributed in 3 age groups, and found
out that young users were more leaned towards graphical contents whereas older
adults felt more interested in textual ones. This finding suggests that
real-time eye gaze data can be a potential source of identifying visual
intention, analyzing which intention aware interactive interfaces can be
designed and developed to facilitate human cognition.
Related papers
- A Deep Learning Framework for Visual Attention Prediction and Analysis of News Interfaces [0.2624902795082451]
News outlets' competition for attention in news interfaces has highlighted the need for demographically-aware saliency prediction models.
We present a deep learning framework that enhances the SaRa (Saliency Ranking) model with DeepGaze IIE.
arXiv Detail & Related papers (2025-03-21T15:20:29Z) - Visual Attention Graph [21.860357478331107]
We propose a new attention representation, called Attention Graph, to simultaneously code the visual saliency and scanpath.
In the attention graph, the semantic-based scanpath is defined by the path on the graph, while saliency of objects can be obtained by computing fixation density on each node.
arXiv Detail & Related papers (2025-03-11T15:22:44Z) - ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models [92.60282074937305]
We introduce ConTextual, a novel dataset featuring human-crafted instructions that require context-sensitive reasoning for text-rich images.
We conduct experiments to assess the performance of 14 foundation models and establish a human performance baseline.
We observe a significant performance gap of 30.8% between GPT-4V and human performance.
arXiv Detail & Related papers (2024-01-24T09:07:11Z) - SeeBel: Seeing is Believing [0.9790236766474201]
We propose three visualizations that enable users to compare dataset statistics and AI performance for segmenting all images.
Our project tries to further increase the interpretability of the trained AI model for segmentation by visualizing its image attention weights.
We propose to conduct surveys on real users to study the efficacy of our visualization tool in computer vision and AI domain.
arXiv Detail & Related papers (2023-12-18T05:11:00Z) - Decoding Attention from Gaze: A Benchmark Dataset and End-to-End Models [6.642042615005632]
Eye-tracking has potential to provide rich behavioral data about human cognition in ecologically valid environments.
This paper studies using computer vision tools for "attention decoding", the task of assessing the locus of a participant's overt visual attention over time.
arXiv Detail & Related papers (2022-11-20T12:24:57Z) - An Efficient Point of Gaze Estimator for Low-Resolution Imaging Systems
Using Extracted Ocular Features Based Neural Architecture [2.8728982844941187]
This paper introduces a neural network based architecture to predict users' gaze at 9 positions displayed in the 11.31deg visual range on the screen.
The eye tracking system can be incorporated by physically disabled individuals, fitted best for those who have eyes as only a limited set of communication.
arXiv Detail & Related papers (2021-06-09T14:35:55Z) - Visual Distant Supervision for Scene Graph Generation [66.10579690929623]
Scene graph models usually require supervised learning on large quantities of labeled data with intensive human annotation.
We propose visual distant supervision, a novel paradigm of visual relation learning, which can train scene graph models without any human-labeled data.
Comprehensive experimental results show that our distantly supervised model outperforms strong weakly supervised and semi-supervised baselines.
arXiv Detail & Related papers (2021-03-29T06:35:24Z) - Intentonomy: a Dataset and Study towards Human Intent Understanding [65.49299806821791]
We study the intent behind social media images with an aim to analyze how visual information can help the recognition of human intent.
We introduce an intent dataset, Intentonomy, comprising 14K images covering a wide range of everyday scenes.
We then systematically study whether, and to what extent, commonly used visual information, i.e., object and context, contribute to human motive understanding.
arXiv Detail & Related papers (2020-11-11T05:39:00Z) - What Can You Learn from Your Muscles? Learning Visual Representation
from Human Interactions [50.435861435121915]
We use human interaction and attention cues to investigate whether we can learn better representations compared to visual-only representations.
Our experiments show that our "muscly-supervised" representation outperforms a visual-only state-of-the-art method MoCo.
arXiv Detail & Related papers (2020-10-16T17:46:53Z) - Towards End-to-end Video-based Eye-Tracking [50.0630362419371]
Estimating eye-gaze from images alone is a challenging task due to un-observable person-specific factors.
We propose a novel dataset and accompanying method which aims to explicitly learn these semantic and temporal relationships.
We demonstrate that the fusion of information from visual stimuli as well as eye images can lead towards achieving performance similar to literature-reported figures.
arXiv Detail & Related papers (2020-07-26T12:39:15Z) - A Graph-based Interactive Reasoning for Human-Object Interaction
Detection [71.50535113279551]
We present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs.
We construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet.
Our framework is end-to-end trainable and free from costly annotations like human pose.
arXiv Detail & Related papers (2020-07-14T09:29:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.