Related papers: Modeling Human Eye Movements with Neural Networks in a Maze-Solving Task

Modeling Human Eye Movements with Neural Networks in a Maze-Solving Task

URL: http://arxiv.org/abs/2212.10367v1
Date: Tue, 20 Dec 2022 15:48:48 GMT
Title: Modeling Human Eye Movements with Neural Networks in a Maze-Solving Task
Authors: Jason Li, Nicholas Watters, Yingting (Sandy) Wang, Hansem Sohn, Mehrdad Jazayeri
Abstract summary: We build deep generative models of eye movements using a novel differentiable architecture for gaze fixations and gaze shifts. We find that human eye movements are best predicted by a model that is optimized not to perform the task as efficiently as possible but instead to run an internal simulation of an object traversing the maze.
Score: 2.092312847886424
License: http://creativecommons.org/licenses/by/4.0/
Abstract: From smoothly pursuing moving objects to rapidly shifting gazes during visual search, humans employ a wide variety of eye movement strategies in different contexts. While eye movements provide a rich window into mental processes, building generative models of eye movements is notoriously difficult, and to date the computational objectives guiding eye movements remain largely a mystery. In this work, we tackled these problems in the context of a canonical spatial planning task, maze-solving. We collected eye movement data from human subjects and built deep generative models of eye movements using a novel differentiable architecture for gaze fixations and gaze shifts. We found that human eye movements are best predicted by a model that is optimized not to perform the task as efficiently as possible but instead to run an internal simulation of an object traversing the maze. This not only provides a generative model of eye movements in this task but also suggests a computational theory for how humans solve the task, namely that humans use mental simulation.

Related papers

Move-in-2D: 2D-Conditioned Human Motion Generation [54.067588636155115]
We propose Move-in-2D, a novel approach to generate human motion sequences conditioned on a scene image. Our approach accepts both a scene image and text prompt as inputs, producing a motion sequence tailored to the scene.
arXiv Detail & Related papers (2024-12-17T18:58:07Z)
Motion Generation Review: Exploring Deep Learning for Lifelike Animation with Manifold [4.853986914715961]
Human motion generation involves creating natural sequences of human body poses, widely used in gaming, virtual reality, and human-computer interaction. Previous work has focused on motion generation based on signals like movement, music, text, or scene background. Mandela learning offers a solution by reducing data dimensionality and capturing subspaces of effective motion.
arXiv Detail & Related papers (2024-12-12T08:27:15Z)
Energy-Efficient Visual Search by Eye Movement and Low-Latency Spiking Neural Network [8.380017457339756]
Human vision incorporates non-uniform resolution retina, efficient eye movement strategy, and spiking neural network (SNN) to balance the requirements in visual field size, visual resolution, energy cost, and inference latency. Here, we examine human visual search behaviors and establish the first SNN-based visual search model. The model can learn either a human-like or a near-optimal fixation strategy, outperform humans in search speed and accuracy, and achieve high energy efficiency through short saccade decision latency and sparse activation.
arXiv Detail & Related papers (2023-10-10T12:39:10Z)
Universal Humanoid Motion Representations for Physics-Based Control [71.46142106079292]
We present a universal motion representation that encompasses a comprehensive range of motor skills for physics-based humanoid control. We first learn a motion imitator that can imitate all of human motion from a large, unstructured motion dataset. We then create our motion representation by distilling skills directly from the imitator.
arXiv Detail & Related papers (2023-10-06T20:48:43Z)
Computing a human-like reaction time metric from stable recurrent vision models [11.87006916768365]
We sketch a general-purpose methodology to construct computational accounts of reaction times from a stimulus-computable, task-optimized model. We demonstrate that our metric aligns with patterns of human reaction times for stimulus manipulations across four disparate visual decision-making tasks. This work paves the way for exploring the temporal alignment of model and human visual strategies in the context of various other cognitive tasks.
arXiv Detail & Related papers (2023-06-20T14:56:02Z)
Task-Oriented Human-Object Interactions Generation with Implicit Neural Representations [61.659439423703155]
TOHO: Task-Oriented Human-Object Interactions Generation with Implicit Neural Representations. Our method generates continuous motions that are parameterized only by the temporal coordinate. This work takes a step further toward general human-scene interaction simulation.
arXiv Detail & Related papers (2023-03-23T09:31:56Z)
GIMO: Gaze-Informed Human Motion Prediction in Context [75.52839760700833]
We propose a large-scale human motion dataset that delivers high-quality body pose sequences, scene scans, and ego-centric views with eye gaze. Our data collection is not tied to specific scenes, which further boosts the motion dynamics observed from our subjects. To realize the full potential of gaze, we propose a novel network architecture that enables bidirectional communication between the gaze and motion branches.
arXiv Detail & Related papers (2022-04-20T13:17:39Z)
The Right Spin: Learning Object Motion from Rotation-Compensated Flow Fields [61.664963331203666]
How humans perceive moving objects is a longstanding research question in computer vision. One approach to the problem is to teach a deep network to model all of these effects. We present a novel probabilistic model to estimate the camera's rotation given the motion field.
arXiv Detail & Related papers (2022-02-28T22:05:09Z)
MIDAS: Deep learning human action intention prediction from natural eye movement patterns [6.557082555839739]
We present an entirely data-driven approach to decode human intention for object manipulation tasks based solely on natural gaze cues. Our results show that we can decode human intention of motion purely from natural gaze cues and object relative position, with $91.9%$ accuracy.
arXiv Detail & Related papers (2022-01-22T21:52:42Z)
Task-Generic Hierarchical Human Motion Prior using VAEs [44.356707509079044]
A deep generative model that describes human motions can benefit a wide range of fundamental computer vision and graphics tasks. We present a method for learning complex human motions independent of specific tasks using a combined global and local latent space. We demonstrate the effectiveness of our hierarchical motion variational autoencoder in a variety of tasks including video-based human pose estimation.
arXiv Detail & Related papers (2021-06-07T23:11:42Z)
Scene-aware Generative Network for Human Motion Synthesis [125.21079898942347]
We propose a new framework, with the interaction between the scene and the human motion taken into account. Considering the uncertainty of human motion, we formulate this task as a generative task. We derive a GAN based learning approach, with discriminators to enforce the compatibility between the human motion and the contextual scene.
arXiv Detail & Related papers (2021-05-31T09:05:50Z)
Generalization Through Hand-Eye Coordination: An Action Space for Learning Spatially-Invariant Visuomotor Control [67.23580984118479]
Imitation Learning (IL) is an effective framework to learn visuomotor skills from offline demonstration data. Hand-eye Action Networks (HAN) can approximate human's hand-eye coordination behaviors by learning from human teleoperated demonstrations.
arXiv Detail & Related papers (2021-02-28T01:49:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.