Related papers: Machine Learning Modeling for Multi-order Human Visual Motion Processing

Machine Learning Modeling for Multi-order Human Visual Motion Processing

URL: http://arxiv.org/abs/2501.12810v1
Date: Wed, 22 Jan 2025 11:41:41 GMT
Title: Machine Learning Modeling for Multi-order Human Visual Motion Processing
Authors: Zitang Sun, Yen-Ju Chen, Yung-Hao Yang, Yuan Li, Shin'ya Nishida,
Abstract summary: This research aims to develop machines that learn to perceive visual motion as do humans.<n>Our model architecture mimics the cortical V1-MT motion processing pathway.<n>We trained our dual-pathway model on novel motion datasets with varying material properties of moving objects.
Score: 5.043066132820344
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Our research aims to develop machines that learn to perceive visual motion as do humans. While recent advances in computer vision (CV) have enabled DNN-based models to accurately estimate optical flow in naturalistic images, a significant disparity remains between CV models and the biological visual system in both architecture and behavior. This disparity includes humans' ability to perceive the motion of higher-order image features (second-order motion), which many CV models fail to capture because of their reliance on the intensity conservation law. Our model architecture mimics the cortical V1-MT motion processing pathway, utilizing a trainable motion energy sensor bank and a recurrent graph network. Supervised learning employing diverse naturalistic videos allows the model to replicate psychophysical and physiological findings about first-order (luminance-based) motion perception. For second-order motion, inspired by neuroscientific findings, the model includes an additional sensing pathway with nonlinear preprocessing before motion energy sensing, implemented using a simple multilayer 3D CNN block. When exploring how the brain acquired the ability to perceive second-order motion in natural environments, in which pure second-order signals are rare, we hypothesized that second-order mechanisms were critical when estimating robust object motion amidst optical fluctuations, such as highlights on glossy surfaces. We trained our dual-pathway model on novel motion datasets with varying material properties of moving objects. We found that training to estimate object motion from non-Lambertian materials naturally endowed the model with the capacity to perceive second-order motion, as can humans. The resulting model effectively aligns with biological systems while generalizing to both first- and second-order motion phenomena in natural scenes.

Related papers

PRIMAL: Physically Reactive and Interactive Motor Model for Avatar Learning [38.004463823796286]
We formulate the motor system of an interactive avatar as a generative motion model.<n>Inspired by recent advances in foundation models, we propose PRIMAL.<n>We leverage the model to create a real-time character animation system in Unreal Engine that feels highly responsive and natural.
arXiv Detail & Related papers (2025-03-21T21:27:57Z)
Object segmentation from common fate: Motion energy processing enables human-like zero-shot generalization to random dot stimuli [10.978614683038758]
We evaluate a broad range of optical flow models and a neuroscience inspired motion energy model for zero-shot figure-ground segmentation. We find that a cross section of 40 deep optical flow models trained on different datasets struggle to estimate motion patterns in random dot videos. This neuroscience-inspired model successfully addresses the lack of human-like zero-shot generalization to random dot stimuli in current computer vision models.
arXiv Detail & Related papers (2024-11-03T09:59:45Z)
Latent Intuitive Physics: Learning to Transfer Hidden Physics from A 3D Video [58.043569985784806]
We introduce latent intuitive physics, a transfer learning framework for physics simulation. It can infer hidden properties of fluids from a single 3D video and simulate the observed fluid in novel scenes. We validate our model in three ways: (i) novel scene simulation with the learned visual-world physics, (ii) future prediction of the observed fluid dynamics, and (iii) supervised particle simulation.
arXiv Detail & Related papers (2024-06-18T16:37:44Z)
Reanimating Images using Neural Representations of Dynamic Stimuli [36.04425924379253]
Video diffusion models are used to decouple static image representation from motion generation.<n>Brain-decoded motion signals enable realistic video reanimation based only on the initial frame of the video.<n>This framework advances our understanding of how the brain represents spatial and temporal information in dynamic visual scenes.
arXiv Detail & Related papers (2024-06-04T17:59:49Z)
SurMo: Surface-based 4D Motion Modeling for Dynamic Human Rendering [45.51684124904457]
We propose a new 4D motion paradigm, SurMo, that models the temporal dynamics and human appearances in a unified framework. Surface-based motion encoding that models 4D human motions with an efficient compact surface-based triplane. Physical motion decoding that is designed to encourage physical motion learning. 4D appearance modeling that renders the motion triplanes into images by an efficient surface-conditioned decoding.
arXiv Detail & Related papers (2024-04-01T16:34:27Z)
Universal Humanoid Motion Representations for Physics-Based Control [71.46142106079292]
We present a universal motion representation that encompasses a comprehensive range of motor skills for physics-based humanoid control. We first learn a motion imitator that can imitate all of human motion from a large, unstructured motion dataset. We then create our motion representation by distilling skills directly from the imitator.
arXiv Detail & Related papers (2023-10-06T20:48:43Z)
Modelling Human Visual Motion Processing with Trainable Motion Energy Sensing and a Self-attention Network [1.9458156037869137]
We propose an image-computable model of human motion perception by bridging the gap between biological and computer vision models. This model architecture aims to capture the computations in V1-MT, the core structure for motion perception in the biological visual system. In silico neurophysiology reveals that our model's unit responses are similar to mammalian neural recordings regarding motion pooling and speed tuning.
arXiv Detail & Related papers (2023-05-16T04:16:07Z)
MotionBERT: A Unified Perspective on Learning Human Motion Representations [46.67364057245364]
We present a unified perspective on tackling various human-centric video tasks by learning human motion representations from large-scale and heterogeneous data resources. We propose a pretraining stage in which a motion encoder is trained to recover the underlying 3D motion from noisy partial 2D observations. We implement motion encoder with a Dual-stream Spatio-temporal Transformer (DSTformer) neural network.
arXiv Detail & Related papers (2022-10-12T19:46:25Z)
Skeleton2Humanoid: Animating Simulated Characters for Physically-plausible Motion In-betweening [59.88594294676711]
Modern deep learning based motion synthesis approaches barely consider the physical plausibility of synthesized motions. We propose a system Skeleton2Humanoid'' which performs physics-oriented motion correction at test time. Experiments on the challenging LaFAN1 dataset show our system can outperform prior methods significantly in terms of both physical plausibility and accuracy.
arXiv Detail & Related papers (2022-10-09T16:15:34Z)
3D Neural Scene Representations for Visuomotor Control [78.79583457239836]
We learn models for dynamic 3D scenes purely from 2D visual observations. A dynamics model, constructed over the learned representation space, enables visuomotor control for challenging manipulation tasks.
arXiv Detail & Related papers (2021-07-08T17:49:37Z)
High-Fidelity Neural Human Motion Transfer from Monocular Video [71.75576402562247]
Video-based human motion transfer creates video animations of humans following a source motion. We present a new framework which performs high-fidelity and temporally-consistent human motion transfer with natural pose-dependent non-rigid deformations. In the experimental results, we significantly outperform the state-of-the-art in terms of video realism.
arXiv Detail & Related papers (2020-12-20T16:54:38Z)
Contact and Human Dynamics from Monocular Video [73.47466545178396]
Existing deep models predict 2D and 3D kinematic poses from video that are approximately accurate, but contain visible errors. We present a physics-based method for inferring 3D human motion from video sequences that takes initial 2D and 3D pose estimates as input.
arXiv Detail & Related papers (2020-07-22T21:09:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.