A Deep Learning Framework for Visual Attention Prediction and Analysis of News Interfaces
- URL: http://arxiv.org/abs/2503.17212v1
- Date: Fri, 21 Mar 2025 15:20:29 GMT
- Title: A Deep Learning Framework for Visual Attention Prediction and Analysis of News Interfaces
- Authors: Matthew Kenely, Dylan Seychell, Carl James Debono, Chris Porter,
- Abstract summary: News outlets' competition for attention in news interfaces has highlighted the need for demographically-aware saliency prediction models.<n>We present a deep learning framework that enhances the SaRa (Saliency Ranking) model with DeepGaze IIE.
- Score: 0.2624902795082451
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: News outlets' competition for attention in news interfaces has highlighted the need for demographically-aware saliency prediction models. Despite recent advancements in saliency detection applied to user interfaces (UI), existing datasets are limited in size and demographic representation. We present a deep learning framework that enhances the SaRa (Saliency Ranking) model with DeepGaze IIE, improving Salient Object Ranking (SOR) performance by 10.7%. Our framework optimizes three key components: saliency map generation, grid segment scoring, and map normalization. Through a two-fold experiment using eye-tracking (30 participants) and mouse-tracking (375 participants aged 13--70), we analyze attention patterns across demographic groups. Statistical analysis reveals significant age-based variations (p < 0.05, {\epsilon^2} = 0.042), with older users (36--70) engaging more with textual content and younger users (13--35) interacting more with images. Mouse-tracking data closely approximates eye-tracking behavior (sAUC = 0.86) and identifies UI elements that immediately stand out, validating its use in large-scale studies. We conclude that saliency studies should prioritize gathering data from a larger, demographically representative sample and report exact demographic distributions.
Related papers
- Evaluating Facial Expression Recognition Datasets for Deep Learning: A Benchmark Study with Novel Similarity Metrics [4.137346786534721]
This study investigates the key characteristics and suitability of widely used Facial Expression Recognition (FER) datasets for training deep learning models.
We compiled and analyzed 24 FER datasets, including those targeting specific age groups such as children, adults, and the elderly.
Benchmark experiments using state-of-the-art neural networks reveal that large-scale, automatically collected datasets tend to generalize better.
arXiv Detail & Related papers (2025-03-26T11:01:00Z) - Mind the Gap! Static and Interactive Evaluations of Large Audio Models [55.87220295533817]
Large Audio Models (LAMs) are designed to power voice-native experiences.<n>This study introduces an interactive approach to evaluate LAMs and collect 7,500 LAM interactions from 484 participants.
arXiv Detail & Related papers (2025-02-21T20:29:02Z) - Accurate and Data-Efficient Toxicity Prediction when Annotators Disagree [1.3749490831384268]
When annotators disagree, predicting the labels given by individual annotators can capture nuances overlooked by traditional label aggregation.
We introduce three approaches to predicting individual annotator ratings on the toxicity of text.
We study the utility of demographic information for rating prediction.
arXiv Detail & Related papers (2024-10-16T04:26:40Z) - ASPEST: Bridging the Gap Between Active Learning and Selective
Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain.
Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples.
In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z) - VIS-iTrack: Visual Intention through Gaze Tracking using Low-Cost Webcam [0.0]
Human intention is an internal, mental characterization for acquiring desired information.
In this work, we determine such intention by analyzing real-time eye gaze data with a low-cost regular webcam.
arXiv Detail & Related papers (2022-02-05T16:00:03Z) - Perceptual Score: What Data Modalities Does Your Model Perceive? [73.75255606437808]
We introduce the perceptual score, a metric that assesses the degree to which a model relies on the different subsets of the input features.
We find that recent, more accurate multi-modal models for visual question-answering tend to perceive the visual data less than their predecessors.
Using the perceptual score also helps to analyze model biases by decomposing the score into data subset contributions.
arXiv Detail & Related papers (2021-10-27T12:19:56Z) - Visual Distant Supervision for Scene Graph Generation [66.10579690929623]
Scene graph models usually require supervised learning on large quantities of labeled data with intensive human annotation.
We propose visual distant supervision, a novel paradigm of visual relation learning, which can train scene graph models without any human-labeled data.
Comprehensive experimental results show that our distantly supervised model outperforms strong weakly supervised and semi-supervised baselines.
arXiv Detail & Related papers (2021-03-29T06:35:24Z) - Understanding Visual Saliency in Mobile User Interfaces [31.278845008743698]
We present findings from a controlled study with 30 participants and 193 mobile UIs.
Results speak to a role of expectations in guiding where users look at.
We release the first annotated dataset for investigating visual saliency in mobile UIs.
arXiv Detail & Related papers (2021-01-22T15:45:13Z) - RethNet: Object-by-Object Learning for Detecting Facial Skin Problems [1.6114012813668934]
We propose a concept of object-by-object learning technique to detect 11 types of facial skin lesions.
Our proposed model reached MIoU of 79.46% on the test of a prepared dataset, representing a 15.34% improvement over Deeplab v3+.
arXiv Detail & Related papers (2021-01-06T16:41:03Z) - ConsNet: Learning Consistency Graph for Zero-Shot Human-Object
Interaction Detection [101.56529337489417]
We consider the problem of Human-Object Interaction (HOI) Detection, which aims to locate and recognize HOI instances in the form of human, action, object> in images.
We argue that multi-level consistencies among objects, actions and interactions are strong cues for generating semantic representations of rare or previously unseen HOIs.
Our model takes visual features of candidate human-object pairs and word embeddings of HOI labels as inputs, maps them into visual-semantic joint embedding space and obtains detection results by measuring their similarities.
arXiv Detail & Related papers (2020-08-14T09:11:18Z) - A Graph-based Interactive Reasoning for Human-Object Interaction
Detection [71.50535113279551]
We present a novel graph-based interactive reasoning model called Interactive Graph (abbr. in-Graph) to infer HOIs.
We construct a new framework to assemble in-Graph models for detecting HOIs, namely in-GraphNet.
Our framework is end-to-end trainable and free from costly annotations like human pose.
arXiv Detail & Related papers (2020-07-14T09:29:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.