Modelling Human Visual Motion Processing with Trainable Motion Energy
Sensing and a Self-attention Network
- URL: http://arxiv.org/abs/2305.09156v2
- Date: Fri, 10 Nov 2023 03:23:18 GMT
- Title: Modelling Human Visual Motion Processing with Trainable Motion Energy
Sensing and a Self-attention Network
- Authors: Zitang Sun, Yen-Ju Chen, Yung-hao Yang, Shin'ya Nishida
- Abstract summary: We propose an image-computable model of human motion perception by bridging the gap between biological and computer vision models.
This model architecture aims to capture the computations in V1-MT, the core structure for motion perception in the biological visual system.
In silico neurophysiology reveals that our model's unit responses are similar to mammalian neural recordings regarding motion pooling and speed tuning.
- Score: 1.9458156037869137
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Visual motion processing is essential for humans to perceive and interact
with dynamic environments. Despite extensive research in cognitive
neuroscience, image-computable models that can extract informative motion flow
from natural scenes in a manner consistent with human visual processing have
yet to be established. Meanwhile, recent advancements in computer vision (CV),
propelled by deep learning, have led to significant progress in optical flow
estimation, a task closely related to motion perception. Here we propose an
image-computable model of human motion perception by bridging the gap between
biological and CV models. Specifically, we introduce a novel two-stages
approach that combines trainable motion energy sensing with a recurrent
self-attention network for adaptive motion integration and segregation. This
model architecture aims to capture the computations in V1-MT, the core
structure for motion perception in the biological visual system, while
providing the ability to derive informative motion flow for a wide range of
stimuli, including complex natural scenes. In silico neurophysiology reveals
that our model's unit responses are similar to mammalian neural recordings
regarding motion pooling and speed tuning. The proposed model can also
replicate human responses to a range of stimuli examined in past psychophysical
studies. The experimental results on the Sintel benchmark demonstrate that our
model predicts human responses better than the ground truth, whereas the
state-of-the-art CV models show the opposite. Our study provides a
computational architecture consistent with human visual motion processing,
although the physiological correspondence may not be exact.
Related papers
- Object segmentation from common fate: Motion energy processing enables human-like zero-shot generalization to random dot stimuli [10.978614683038758]
We evaluate a broad range of optical flow models and a neuroscience inspired motion energy model for zero-shot figure-ground segmentation.
We find that a cross section of 40 deep optical flow models trained on different datasets struggle to estimate motion patterns in random dot videos.
This neuroscience-inspired model successfully addresses the lack of human-like zero-shot generalization to random dot stimuli in current computer vision models.
arXiv Detail & Related papers (2024-11-03T09:59:45Z) - Neural Dynamics Model of Visual Decision-Making: Learning from Human Experts [28.340344705437758]
We implement a comprehensive visual decision-making model that spans from visual input to behavioral output.
Our model aligns closely with human behavior and reflects neural activities in primates.
A neuroimaging-informed fine-tuning approach was introduced and applied to the model, leading to performance improvements.
arXiv Detail & Related papers (2024-09-04T02:38:52Z) - Time-Dependent VAE for Building Latent Representations from Visual Neural Activity with Complex Dynamics [25.454851828755054]
TiDeSPL-VAE can effectively analyze complex visual neural activity and model temporal relationships in a natural way.
Results show that our model not only yields the best decoding performance on naturalistic scenes/movies but also extracts explicit neural dynamics.
arXiv Detail & Related papers (2024-08-15T03:27:23Z) - Neural Representations of Dynamic Visual Stimuli [36.04425924379253]
We show that visual motion information as optical flow can be predicted (or decoded) from brain activity as measured by fMRI.
We show that this predicted motion can be used to realistically animate static images using a motion-conditioned video diffusion model.
This work offers a novel framework for interpreting how the human brain processes dynamic visual information.
arXiv Detail & Related papers (2024-06-04T17:59:49Z) - Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption [64.07607726562841]
Existing multi-person human reconstruction approaches mainly focus on recovering accurate poses or avoiding penetration.
In this work, we tackle the task of reconstructing closely interactive humans from a monocular video.
We propose to leverage knowledge from proxemic behavior and physics to compensate the lack of visual information.
arXiv Detail & Related papers (2024-04-17T11:55:45Z) - Scaling Up Dynamic Human-Scene Interaction Modeling [58.032368564071895]
TRUMANS is the most comprehensive motion-captured HSI dataset currently available.
It intricately captures whole-body human motions and part-level object dynamics.
We devise a diffusion-based autoregressive model that efficiently generates HSI sequences of any length.
arXiv Detail & Related papers (2024-03-13T15:45:04Z) - Unidirectional brain-computer interface: Artificial neural network
encoding natural images to fMRI response in the visual cortex [12.1427193917406]
We propose an artificial neural network dubbed VISION to mimic the human brain and show how it can foster neuroscientific inquiries.
VISION successfully predicts human hemodynamic responses as fMRI voxel values to visual inputs with an accuracy exceeding state-of-the-art performance by 45%.
arXiv Detail & Related papers (2023-09-26T15:38:26Z) - Persistent-Transient Duality: A Multi-mechanism Approach for Modeling
Human-Object Interaction [58.67761673662716]
Humans are highly adaptable, swiftly switching between different modes to handle different tasks, situations and contexts.
In Human-object interaction (HOI) activities, these modes can be attributed to two mechanisms: (1) the large-scale consistent plan for the whole activity and (2) the small-scale children interactive actions that start and end along the timeline.
This work proposes to model two concurrent mechanisms that jointly control human motion.
arXiv Detail & Related papers (2023-07-24T12:21:33Z) - Learning Local Recurrent Models for Human Mesh Recovery [50.85467243778406]
We present a new method for video mesh recovery that divides the human mesh into several local parts following the standard skeletal model.
We then model the dynamics of each local part with separate recurrent models, with each model conditioned appropriately based on the known kinematic structure of the human body.
This results in a structure-informed local recurrent learning architecture that can be trained in an end-to-end fashion with available annotations.
arXiv Detail & Related papers (2021-07-27T14:30:33Z) - High-Fidelity Neural Human Motion Transfer from Monocular Video [71.75576402562247]
Video-based human motion transfer creates video animations of humans following a source motion.
We present a new framework which performs high-fidelity and temporally-consistent human motion transfer with natural pose-dependent non-rigid deformations.
In the experimental results, we significantly outperform the state-of-the-art in terms of video realism.
arXiv Detail & Related papers (2020-12-20T16:54:38Z) - UniCon: Universal Neural Controller For Physics-based Character Motion [70.45421551688332]
We propose a physics-based universal neural controller (UniCon) that learns to master thousands of motions with different styles by learning on large-scale motion datasets.
UniCon can support keyboard-driven control, compose motion sequences drawn from a large pool of locomotion and acrobatics skills and teleport a person captured on video to a physics-based virtual avatar.
arXiv Detail & Related papers (2020-11-30T18:51:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.