Neural Representations of Dynamic Visual Stimuli
- URL: http://arxiv.org/abs/2406.02659v1
- Date: Tue, 4 Jun 2024 17:59:49 GMT
- Title: Neural Representations of Dynamic Visual Stimuli
- Authors: Jacob Yeung, Andrew F. Luo, Gabriel Sarch, Margaret M. Henderson, Deva Ramanan, Michael J. Tarr,
- Abstract summary: We show that visual motion information as optical flow can be predicted (or decoded) from brain activity as measured by fMRI.
We show that this predicted motion can be used to realistically animate static images using a motion-conditioned video diffusion model.
This work offers a novel framework for interpreting how the human brain processes dynamic visual information.
- Score: 36.04425924379253
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Humans experience the world through constantly changing visual stimuli, where scenes can shift and move, change in appearance, and vary in distance. The dynamic nature of visual perception is a fundamental aspect of our daily lives, yet the large majority of research on object and scene processing, particularly using fMRI, has focused on static stimuli. While studies of static image perception are attractive due to their computational simplicity, they impose a strong non-naturalistic constraint on our investigation of human vision. In contrast, dynamic visual stimuli offer a more ecologically-valid approach but present new challenges due to the interplay between spatial and temporal information, making it difficult to disentangle the representations of stable image features and motion. To overcome this limitation -- given dynamic inputs, we explicitly decouple the modeling of static image representations and motion representations in the human brain. Three results demonstrate the feasibility of this approach. First, we show that visual motion information as optical flow can be predicted (or decoded) from brain activity as measured by fMRI. Second, we show that this predicted motion can be used to realistically animate static images using a motion-conditioned video diffusion model (where the motion is driven by fMRI brain activity). Third, we show prediction in the reverse direction: existing video encoders can be fine-tuned to predict fMRI brain activity from video imagery, and can do so more effectively than image encoders. This foundational work offers a novel, extensible framework for interpreting how the human brain processes dynamic visual information.
Related papers
- Aligning Neuronal Coding of Dynamic Visual Scenes with Foundation Vision Models [2.790870674964473]
We propose Vi-ST, atemporal convolutional neural network fed with a self-supervised Vision Transformer (ViT)
Our proposed Vi-ST demonstrates a novel modeling framework for neuronal coding of dynamic visual scenes in the brain.
arXiv Detail & Related papers (2024-07-15T14:06:13Z) - EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting [95.44545809256473]
EgoGaussian is a method capable of simultaneously reconstructing 3D scenes and dynamically tracking 3D object motion from RGB egocentric input alone.
We show significant improvements in terms of both dynamic object and background reconstruction quality compared to the state-of-the-art.
arXiv Detail & Related papers (2024-06-28T10:39:36Z) - Animate Your Thoughts: Decoupled Reconstruction of Dynamic Natural Vision from Slow Brain Activity [13.291585611137355]
Reconstructing human dynamic vision from brain activity is a challenging task with great scientific significance.
This paper propose a two-stage model named Mind-Animator, which achieves state-of-the-art performance on three public datasets.
We substantiate that the reconstructed video dynamics are indeed derived from fMRI, rather than hallucinations of the generative model.
arXiv Detail & Related papers (2024-05-06T08:56:41Z) - From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations [107.88375243135579]
Given speech audio, we output multiple possibilities of gestural motion for an individual, including face, body, and hands.
We visualize the generated motion using highly photorealistic avatars that can express crucial nuances in gestures.
Experiments show our model generates appropriate and diverse gestures, outperforming both diffusion- and VQ-only methods.
arXiv Detail & Related papers (2024-01-03T18:55:16Z) - Unidirectional brain-computer interface: Artificial neural network
encoding natural images to fMRI response in the visual cortex [12.1427193917406]
We propose an artificial neural network dubbed VISION to mimic the human brain and show how it can foster neuroscientific inquiries.
VISION successfully predicts human hemodynamic responses as fMRI voxel values to visual inputs with an accuracy exceeding state-of-the-art performance by 45%.
arXiv Detail & Related papers (2023-09-26T15:38:26Z) - Modelling Human Visual Motion Processing with Trainable Motion Energy
Sensing and a Self-attention Network [1.9458156037869137]
We propose an image-computable model of human motion perception by bridging the gap between biological and computer vision models.
This model architecture aims to capture the computations in V1-MT, the core structure for motion perception in the biological visual system.
In silico neurophysiology reveals that our model's unit responses are similar to mammalian neural recordings regarding motion pooling and speed tuning.
arXiv Detail & Related papers (2023-05-16T04:16:07Z) - Learning Motion-Dependent Appearance for High-Fidelity Rendering of
Dynamic Humans from a Single Camera [49.357174195542854]
A key challenge of learning the dynamics of the appearance lies in the requirement of a prohibitively large amount of observations.
We show that our method can generate a temporally coherent video of dynamic humans for unseen body poses and novel views given a single view video.
arXiv Detail & Related papers (2022-03-24T00:22:03Z) - High-Fidelity Neural Human Motion Transfer from Monocular Video [71.75576402562247]
Video-based human motion transfer creates video animations of humans following a source motion.
We present a new framework which performs high-fidelity and temporally-consistent human motion transfer with natural pose-dependent non-rigid deformations.
In the experimental results, we significantly outperform the state-of-the-art in terms of video realism.
arXiv Detail & Related papers (2020-12-20T16:54:38Z) - Neural Radiance Flow for 4D View Synthesis and Video Processing [59.9116932930108]
We present a method to learn a 4D spatial-temporal representation of a dynamic scene from a set of RGB images.
Key to our approach is the use of a neural implicit representation that learns to capture the 3D occupancy, radiance, and dynamics of the scene.
arXiv Detail & Related papers (2020-12-17T17:54:32Z) - Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes [70.76742458931935]
We introduce a new representation that models the dynamic scene as a time-variant continuous function of appearance, geometry, and 3D scene motion.
Our representation is optimized through a neural network to fit the observed input views.
We show that our representation can be used for complex dynamic scenes, including thin structures, view-dependent effects, and natural degrees of motion.
arXiv Detail & Related papers (2020-11-26T01:23:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.