Modeling Human Eye Movements with Neural Networks in a Maze-Solving Task
- URL: http://arxiv.org/abs/2212.10367v1
- Date: Tue, 20 Dec 2022 15:48:48 GMT
- Title: Modeling Human Eye Movements with Neural Networks in a Maze-Solving Task
- Authors: Jason Li, Nicholas Watters, Yingting (Sandy) Wang, Hansem Sohn,
Mehrdad Jazayeri
- Abstract summary: We build deep generative models of eye movements using a novel differentiable architecture for gaze fixations and gaze shifts.
We find that human eye movements are best predicted by a model that is optimized not to perform the task as efficiently as possible but instead to run an internal simulation of an object traversing the maze.
- Score: 2.092312847886424
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: From smoothly pursuing moving objects to rapidly shifting gazes during visual
search, humans employ a wide variety of eye movement strategies in different
contexts. While eye movements provide a rich window into mental processes,
building generative models of eye movements is notoriously difficult, and to
date the computational objectives guiding eye movements remain largely a
mystery. In this work, we tackled these problems in the context of a canonical
spatial planning task, maze-solving. We collected eye movement data from human
subjects and built deep generative models of eye movements using a novel
differentiable architecture for gaze fixations and gaze shifts. We found that
human eye movements are best predicted by a model that is optimized not to
perform the task as efficiently as possible but instead to run an internal
simulation of an object traversing the maze. This not only provides a
generative model of eye movements in this task but also suggests a
computational theory for how humans solve the task, namely that humans use
mental simulation.
Related papers
- Energy-Efficient Visual Search by Eye Movement and Low-Latency Spiking
Neural Network [8.380017457339756]
Human vision incorporates non-uniform resolution retina, efficient eye movement strategy, and spiking neural network (SNN) to balance the requirements in visual field size, visual resolution, energy cost, and inference latency.
Here, we examine human visual search behaviors and establish the first SNN-based visual search model.
The model can learn either a human-like or a near-optimal fixation strategy, outperform humans in search speed and accuracy, and achieve high energy efficiency through short saccade decision latency and sparse activation.
arXiv Detail & Related papers (2023-10-10T12:39:10Z) - Universal Humanoid Motion Representations for Physics-Based Control [71.46142106079292]
We present a universal motion representation that encompasses a comprehensive range of motor skills for physics-based humanoid control.
We first learn a motion imitator that can imitate all of human motion from a large, unstructured motion dataset.
We then create our motion representation by distilling skills directly from the imitator.
arXiv Detail & Related papers (2023-10-06T20:48:43Z) - Computing a human-like reaction time metric from stable recurrent vision
models [11.87006916768365]
We sketch a general-purpose methodology to construct computational accounts of reaction times from a stimulus-computable, task-optimized model.
We demonstrate that our metric aligns with patterns of human reaction times for stimulus manipulations across four disparate visual decision-making tasks.
This work paves the way for exploring the temporal alignment of model and human visual strategies in the context of various other cognitive tasks.
arXiv Detail & Related papers (2023-06-20T14:56:02Z) - Task-Oriented Human-Object Interactions Generation with Implicit Neural
Representations [61.659439423703155]
TOHO: Task-Oriented Human-Object Interactions Generation with Implicit Neural Representations.
Our method generates continuous motions that are parameterized only by the temporal coordinate.
This work takes a step further toward general human-scene interaction simulation.
arXiv Detail & Related papers (2023-03-23T09:31:56Z) - GIMO: Gaze-Informed Human Motion Prediction in Context [75.52839760700833]
We propose a large-scale human motion dataset that delivers high-quality body pose sequences, scene scans, and ego-centric views with eye gaze.
Our data collection is not tied to specific scenes, which further boosts the motion dynamics observed from our subjects.
To realize the full potential of gaze, we propose a novel network architecture that enables bidirectional communication between the gaze and motion branches.
arXiv Detail & Related papers (2022-04-20T13:17:39Z) - The Right Spin: Learning Object Motion from Rotation-Compensated Flow
Fields [61.664963331203666]
How humans perceive moving objects is a longstanding research question in computer vision.
One approach to the problem is to teach a deep network to model all of these effects.
We present a novel probabilistic model to estimate the camera's rotation given the motion field.
arXiv Detail & Related papers (2022-02-28T22:05:09Z) - MIDAS: Deep learning human action intention prediction from natural eye
movement patterns [6.557082555839739]
We present an entirely data-driven approach to decode human intention for object manipulation tasks based solely on natural gaze cues.
Our results show that we can decode human intention of motion purely from natural gaze cues and object relative position, with $91.9%$ accuracy.
arXiv Detail & Related papers (2022-01-22T21:52:42Z) - Task-Generic Hierarchical Human Motion Prior using VAEs [44.356707509079044]
A deep generative model that describes human motions can benefit a wide range of fundamental computer vision and graphics tasks.
We present a method for learning complex human motions independent of specific tasks using a combined global and local latent space.
We demonstrate the effectiveness of our hierarchical motion variational autoencoder in a variety of tasks including video-based human pose estimation.
arXiv Detail & Related papers (2021-06-07T23:11:42Z) - Scene-aware Generative Network for Human Motion Synthesis [125.21079898942347]
We propose a new framework, with the interaction between the scene and the human motion taken into account.
Considering the uncertainty of human motion, we formulate this task as a generative task.
We derive a GAN based learning approach, with discriminators to enforce the compatibility between the human motion and the contextual scene.
arXiv Detail & Related papers (2021-05-31T09:05:50Z) - Generalization Through Hand-Eye Coordination: An Action Space for
Learning Spatially-Invariant Visuomotor Control [67.23580984118479]
Imitation Learning (IL) is an effective framework to learn visuomotor skills from offline demonstration data.
Hand-eye Action Networks (HAN) can approximate human's hand-eye coordination behaviors by learning from human teleoperated demonstrations.
arXiv Detail & Related papers (2021-02-28T01:49:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.