A Resource-Rational Principle for Modeling Visual Attention Control
- URL: http://arxiv.org/abs/2603.02056v1
- Date: Mon, 02 Mar 2026 16:45:50 GMT
- Title: A Resource-Rational Principle for Modeling Visual Attention Control
- Authors: Yunpeng Bai,
- Abstract summary: dissertation develops a resource-rational, simulation-based framework for modeling visual attention.<n>I formalize visual tasks as bounded-optimal control problems using Partially Observable Markov Decision Processes.<n>These models are instantiated in simulation environments spanning traditional text reading and reading-while-walking with smart glasses.
- Score: 13.330522631439917
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Understanding how people allocate visual attention is central to Human-Computer Interaction (HCI), yet existing computational models of attention are often either descriptive, task-specific, or difficult to interpret. My dissertation develops a resource-rational, simulation-based framework for modeling visual attention as a sequential decision-making process under perceptual, memory, and time constraints. I formalize visual tasks, such as reading and multitasking, as bounded-optimal control problems using Partially Observable Markov Decision Processes, enabling eye-movement behaviors such as fixation and attention switching to emerge from rational adaptation rather than being hand-coded or purely data-driven. These models are instantiated in simulation environments spanning traditional text reading and reading-while-walking with smart glasses, where they reproduce classic empirical effects, explain observed trade-offs between comprehension and safety, and generate novel predictions under time pressure and interface variation. Collectively, this work contributes a unified computational account of visual attention, offering new tools for theory-driven and resource-efficient HCI design.
Related papers
- Attention mechanisms in neural networks [0.0]
Attention mechanisms enable models to selectively focus on relevant portions of input sequences through learned weighting functions.<n>This monograph provides a comprehensive and rigorous mathematical treatment of attention mechanisms, encompassing their theoretical foundations, computational properties, and practical implementations in contemporary deep learning systems.<n> Applications in natural language processing, computer vision, and multimodal learning demonstrate the versatility of attention mechanisms.
arXiv Detail & Related papers (2026-01-06T17:12:10Z) - Latent Implicit Visual Reasoning [59.39913238320798]
We propose a task-agnostic mechanism that trains LMMs to discover and use visual reasoning tokens without explicit supervision.<n>Our approach outperforms direct fine-tuning and achieves state-of-the-art results on a diverse range of vision-centric tasks.
arXiv Detail & Related papers (2025-12-24T14:59:49Z) - See, Think, Act: Online Shopper Behavior Simulation with VLM Agents [58.92444959954643]
This paper investigates the integration of visual information, specifically webpage screenshots, into behavior simulation via VLMs.<n>We employ SFT for joint action prediction and rationale generation, conditioning on the full interaction context.<n>To further enhance reasoning capabilities, we integrate RL with a hierarchical reward structure, scaled by a difficulty-aware factor.
arXiv Detail & Related papers (2025-10-22T05:07:14Z) - Learning an Ensemble Token from Task-driven Priors in Facial Analysis [6.1218317445177135]
We introduce ET-Fuser, a novel methodology for learning ensemble token.<n>We propose a robust prior unification learning method that generates a ensemble token within a self-attention mechanism.<n>Our results show improvements across a variety of facial analysis, with statistically significant enhancements observed in the feature representations.
arXiv Detail & Related papers (2025-07-02T02:07:31Z) - Dynamic Programming Techniques for Enhancing Cognitive Representation in Knowledge Tracing [125.75923987618977]
We propose the Cognitive Representation Dynamic Programming based Knowledge Tracing (CRDP-KT) model.<n>It is a dynamic programming algorithm to optimize cognitive representations based on the difficulty of the questions and the performance intervals between them.<n>It provides more accurate and systematic input features for subsequent model training, thereby minimizing distortion in the simulation of cognitive states.
arXiv Detail & Related papers (2025-06-03T14:44:48Z) - Gaze-Guided Learning: Avoiding Shortcut Bias in Visual Classification [3.1208151315473622]
We introduce Gaze-CIFAR-10, a human gaze time-series dataset, along with a dual-sequence gaze encoder.<n>In parallel, a Vision Transformer (ViT) is employed to learn the sequential representation of image content.<n>Our framework integrates human gaze priors with machine-derived visual sequences, effectively correcting inaccurate localization in image feature representations.
arXiv Detail & Related papers (2025-04-08T00:40:46Z) - ViRAC: A Vision-Reasoning Agent Head Movement Control Framework in Arbitrary Virtual Environments [0.13654846342364302]
We propose ViRAC, which exploits the common-sense knowledge and reasoning capabilities of large-scale models.<n>ViRAC produces more natural and context-aware head rotations than recent state-of-the-art techniques.
arXiv Detail & Related papers (2025-02-14T09:46:43Z) - A Cognitive Paradigm Approach to Probe the Perception-Reasoning Interface in VLMs [3.2228025627337864]
This paper introduces a structured evaluation framework to dissect the perception-reasoning interface in Vision-Language Models (VLMs)<n>We propose three distinct evaluation paradigms, mirroring human problem-solving strategies.<n>Applying this framework, we demonstrate that CA, leveraging powerful language models for reasoning over rich, independently generated descriptions, achieves new state-of-the-art (SOTA) performance.
arXiv Detail & Related papers (2025-01-23T12:42:42Z) - Relation-Oriented: Toward Causal Knowledge-Aligned AGI [24.76814726122543]
Relation-Oriented paradigm is aimed at facilitating the development of causal knowledge-aligned Artificial General Intelligence.
As its methodological counterpart, the proposed Relation-Indexed Representation Learning (RIRL) is validated through efficacy experiments.
arXiv Detail & Related papers (2023-07-31T03:32:59Z) - GAMR: A Guided Attention Model for (visual) Reasoning [7.919213739992465]
Humans continue to outperform modern AI systems in their ability to flexibly parse and understand complex visual scenes.
We present a novel module for visual reasoning, the Guided Attention Model for (visual) Reasoning (GAMR)
GAMR posits that the brain solves complex visual reasoning problems dynamically via sequences of attention shifts to select and route task-relevant visual information into memory.
arXiv Detail & Related papers (2022-06-10T07:52:06Z) - INFOrmation Prioritization through EmPOWERment in Visual Model-Based RL [90.06845886194235]
We propose a modified objective for model-based reinforcement learning (RL)
We integrate a term inspired by variational empowerment into a state-space model based on mutual information.
We evaluate the approach on a suite of vision-based robot control tasks with natural video backgrounds.
arXiv Detail & Related papers (2022-04-18T23:09:23Z) - Causal Navigation by Continuous-time Neural Networks [108.84958284162857]
We propose a theoretical and experimental framework for learning causal representations using continuous-time neural networks.
We evaluate our method in the context of visual-control learning of drones over a series of complex tasks.
arXiv Detail & Related papers (2021-06-15T17:45:32Z) - Cost-effective Interactive Attention Learning with Neural Attention
Processes [79.8115563067513]
We propose a novel interactive learning framework which we refer to as Interactive Attention Learning (IAL)
IAL is prone to overfitting due to scarcity of human annotations, and requires costly retraining.
We tackle these challenges by proposing a sample-efficient attention mechanism and a cost-effective reranking algorithm for instances and features.
arXiv Detail & Related papers (2020-06-09T17:36:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.