Neural Representations of Dynamic Visual Stimuli
- URL: http://arxiv.org/abs/2406.02659v1
- Date: Tue, 4 Jun 2024 17:59:49 GMT
- Title: Neural Representations of Dynamic Visual Stimuli
- Authors: Jacob Yeung, Andrew F. Luo, Gabriel Sarch, Margaret M. Henderson, Deva Ramanan, Michael J. Tarr,
- Abstract summary: We show that visual motion information as optical flow can be predicted (or decoded) from brain activity as measured by fMRI.
We show that this predicted motion can be used to realistically animate static images using a motion-conditioned video diffusion model.
This work offers a novel framework for interpreting how the human brain processes dynamic visual information.
- Score: 36.04425924379253
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Humans experience the world through constantly changing visual stimuli, where scenes can shift and move, change in appearance, and vary in distance. The dynamic nature of visual perception is a fundamental aspect of our daily lives, yet the large majority of research on object and scene processing, particularly using fMRI, has focused on static stimuli. While studies of static image perception are attractive due to their computational simplicity, they impose a strong non-naturalistic constraint on our investigation of human vision. In contrast, dynamic visual stimuli offer a more ecologically-valid approach but present new challenges due to the interplay between spatial and temporal information, making it difficult to disentangle the representations of stable image features and motion. To overcome this limitation -- given dynamic inputs, we explicitly decouple the modeling of static image representations and motion representations in the human brain. Three results demonstrate the feasibility of this approach. First, we show that visual motion information as optical flow can be predicted (or decoded) from brain activity as measured by fMRI. Second, we show that this predicted motion can be used to realistically animate static images using a motion-conditioned video diffusion model (where the motion is driven by fMRI brain activity). Third, we show prediction in the reverse direction: existing video encoders can be fine-tuned to predict fMRI brain activity from video imagery, and can do so more effectively than image encoders. This foundational work offers a novel, extensible framework for interpreting how the human brain processes dynamic visual information.
Related papers
- DecoFuse: Decomposing and Fusing the "What", "Where", and "How" for Brain-Inspired fMRI-to-Video Decoding [82.91021399231184]
Existing fMRI-to-video methods often focus on semantic content while overlooking spatial and motion information.
We propose DecoFuse, a novel brain-inspired framework for decoding videos from fMRI signals.
It first decomposes the video into three components - semantic, spatial, and motion - then decodes each component separately before fusing them to reconstruct the video.
arXiv Detail & Related papers (2025-04-01T05:28:37Z) - Neurons: Emulating the Human Visual Cortex Improves Fidelity and Interpretability in fMRI-to-Video Reconstruction [13.110669865114533]
NEURONS is a concept framework that decouples learning into four correlated sub-tasks.
It simulates the visual cortex's functional specialization, allowing the model to capture diverse video content.
NEURONS shows a strong functional correlation with the visual cortex, highlighting its potential for brain-computer interfaces and clinical applications.
arXiv Detail & Related papers (2025-03-14T08:12:28Z) - X-Dyna: Expressive Dynamic Human Image Animation [49.896933584815926]
X-Dyna is a zero-shot, diffusion-based pipeline for animating a single human image.
It generates realistic, context-aware dynamics for both the subject and the surrounding environment.
arXiv Detail & Related papers (2025-01-17T08:10:53Z) - Neuro-3D: Towards 3D Visual Decoding from EEG Signals [49.502364730056044]
We introduce a new neuroscience task: decoding 3D visual perception from EEG signals.
We first present EEG-3D, a dataset featuring multimodal analysis data and EEG recordings from 12 subjects viewing 72 categories of 3D objects rendered in both videos and images.
We propose Neuro-3D, a 3D visual decoding framework based on EEG signals.
arXiv Detail & Related papers (2024-11-19T05:52:17Z) - Aligning Neuronal Coding of Dynamic Visual Scenes with Foundation Vision Models [2.790870674964473]
We propose Vi-ST, atemporal convolutional neural network fed with a self-supervised Vision Transformer (ViT)
Our proposed Vi-ST demonstrates a novel modeling framework for neuronal coding of dynamic visual scenes in the brain.
arXiv Detail & Related papers (2024-07-15T14:06:13Z) - EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting [95.44545809256473]
EgoGaussian is a method capable of simultaneously reconstructing 3D scenes and dynamically tracking 3D object motion from RGB egocentric input alone.
We show significant improvements in terms of both dynamic object and background reconstruction quality compared to the state-of-the-art.
arXiv Detail & Related papers (2024-06-28T10:39:36Z) - Brain3D: Generating 3D Objects from fMRI [76.41771117405973]
We design a novel 3D object representation learning method, Brain3D, that takes as input the fMRI data of a subject.
We show that our model captures the distinct functionalities of each region of human vision system.
Preliminary evaluations indicate that Brain3D can successfully identify the disordered brain regions in simulated scenarios.
arXiv Detail & Related papers (2024-05-24T06:06:11Z) - Animate Your Thoughts: Decoupled Reconstruction of Dynamic Natural Vision from Slow Brain Activity [13.291585611137355]
Reconstructing human dynamic vision from brain activity is a challenging task with great scientific significance.
This paper propose a two-stage model named Mind-Animator, which achieves state-of-the-art performance on three public datasets.
We substantiate that the reconstructed video dynamics are indeed derived from fMRI, rather than hallucinations of the generative model.
arXiv Detail & Related papers (2024-05-06T08:56:41Z) - From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations [107.88375243135579]
Given speech audio, we output multiple possibilities of gestural motion for an individual, including face, body, and hands.
We visualize the generated motion using highly photorealistic avatars that can express crucial nuances in gestures.
Experiments show our model generates appropriate and diverse gestures, outperforming both diffusion- and VQ-only methods.
arXiv Detail & Related papers (2024-01-03T18:55:16Z) - Decoding Realistic Images from Brain Activity with Contrastive
Self-supervision and Latent Diffusion [29.335943994256052]
Reconstructing visual stimuli from human brain activities provides a promising opportunity to advance our understanding of the brain's visual system.
We propose a two-phase framework named Contrast and Diffuse (CnD) to decode realistic images from functional magnetic resonance imaging (fMRI) recordings.
arXiv Detail & Related papers (2023-09-30T09:15:22Z) - Unidirectional brain-computer interface: Artificial neural network
encoding natural images to fMRI response in the visual cortex [12.1427193917406]
We propose an artificial neural network dubbed VISION to mimic the human brain and show how it can foster neuroscientific inquiries.
VISION successfully predicts human hemodynamic responses as fMRI voxel values to visual inputs with an accuracy exceeding state-of-the-art performance by 45%.
arXiv Detail & Related papers (2023-09-26T15:38:26Z) - Cinematic Mindscapes: High-quality Video Reconstruction from Brain
Activity [0.0]
We show that Mind-Video can reconstruct high-quality videos of arbitrary frame rates using adversarial guidance.
We also show that our model is biologically plausible and interpretable, reflecting established physiological processes.
arXiv Detail & Related papers (2023-05-19T13:44:25Z) - Brain Captioning: Decoding human brain activity into images and text [1.5486926490986461]
We present an innovative method for decoding brain activity into meaningful images and captions.
Our approach takes advantage of cutting-edge image captioning models and incorporates a unique image reconstruction pipeline.
We evaluate our methods using quantitative metrics for both generated captions and images.
arXiv Detail & Related papers (2023-05-19T09:57:19Z) - Modelling Human Visual Motion Processing with Trainable Motion Energy
Sensing and a Self-attention Network [1.9458156037869137]
We propose an image-computable model of human motion perception by bridging the gap between biological and computer vision models.
This model architecture aims to capture the computations in V1-MT, the core structure for motion perception in the biological visual system.
In silico neurophysiology reveals that our model's unit responses are similar to mammalian neural recordings regarding motion pooling and speed tuning.
arXiv Detail & Related papers (2023-05-16T04:16:07Z) - Joint fMRI Decoding and Encoding with Latent Embedding Alignment [77.66508125297754]
We introduce a unified framework that addresses both fMRI decoding and encoding.
Our model concurrently recovers visual stimuli from fMRI signals and predicts brain activity from images within a unified framework.
arXiv Detail & Related papers (2023-03-26T14:14:58Z) - Learning Motion-Dependent Appearance for High-Fidelity Rendering of
Dynamic Humans from a Single Camera [49.357174195542854]
A key challenge of learning the dynamics of the appearance lies in the requirement of a prohibitively large amount of observations.
We show that our method can generate a temporally coherent video of dynamic humans for unseen body poses and novel views given a single view video.
arXiv Detail & Related papers (2022-03-24T00:22:03Z) - High-Fidelity Neural Human Motion Transfer from Monocular Video [71.75576402562247]
Video-based human motion transfer creates video animations of humans following a source motion.
We present a new framework which performs high-fidelity and temporally-consistent human motion transfer with natural pose-dependent non-rigid deformations.
In the experimental results, we significantly outperform the state-of-the-art in terms of video realism.
arXiv Detail & Related papers (2020-12-20T16:54:38Z) - Neural Radiance Flow for 4D View Synthesis and Video Processing [59.9116932930108]
We present a method to learn a 4D spatial-temporal representation of a dynamic scene from a set of RGB images.
Key to our approach is the use of a neural implicit representation that learns to capture the 3D occupancy, radiance, and dynamics of the scene.
arXiv Detail & Related papers (2020-12-17T17:54:32Z) - Neural Scene Flow Fields for Space-Time View Synthesis of Dynamic Scenes [70.76742458931935]
We introduce a new representation that models the dynamic scene as a time-variant continuous function of appearance, geometry, and 3D scene motion.
Our representation is optimized through a neural network to fit the observed input views.
We show that our representation can be used for complex dynamic scenes, including thin structures, view-dependent effects, and natural degrees of motion.
arXiv Detail & Related papers (2020-11-26T01:23:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.