Reanimating Images using Neural Representations of Dynamic Stimuli
- URL: http://arxiv.org/abs/2406.02659v2
- Date: Fri, 29 Nov 2024 18:59:44 GMT
- Title: Reanimating Images using Neural Representations of Dynamic Stimuli
- Authors: Jacob Yeung, Andrew F. Luo, Gabriel Sarch, Margaret M. Henderson, Deva Ramanan, Michael J. Tarr,
- Abstract summary: Video diffusion models are used to decouple static image representation from motion generation.
Brain-decoded motion signals enable realistic video reanimation based only on the initial frame of the video.
This framework advances our understanding of how the brain represents spatial and temporal information in dynamic visual scenes.
- Score: 36.04425924379253
- License:
- Abstract: While computer vision models have made incredible strides in static image recognition, they still do not match human performance in tasks that require the understanding of complex, dynamic motion. This is notably true for real-world scenarios where embodied agents face complex and motion-rich environments. Our approach leverages state-of-the-art video diffusion models to decouple static image representation from motion generation, enabling us to utilize fMRI brain activity for a deeper understanding of human responses to dynamic visual stimuli. Conversely, we also demonstrate that information about the brain's representation of motion can enhance the prediction of optical flow in artificial systems. Our novel approach leads to four main findings: (1) Visual motion, represented as fine-grained, object-level resolution optical flow, can be decoded from brain activity generated by participants viewing video stimuli; (2) Video encoders outperform image-based models in predicting video-driven brain activity; (3) Brain-decoded motion signals enable realistic video reanimation based only on the initial frame of the video; and (4) We extend prior work to achieve full video decoding from video-driven brain activity. This framework advances our understanding of how the brain represents spatial and temporal information in dynamic visual scenes. Our findings demonstrate the potential of combining brain imaging with video diffusion models for developing more robust and biologically-inspired computer vision systems. We show additional decoding and encoding examples on this site: https://sites.google.com/view/neural-dynamics/home.
Related papers
- X-Dyna: Expressive Dynamic Human Image Animation [49.896933584815926]
X-Dyna is a zero-shot, diffusion-based pipeline for animating a single human image.
It generates realistic, context-aware dynamics for both the subject and the surrounding environment.
arXiv Detail & Related papers (2025-01-17T08:10:53Z) - Aligning Neuronal Coding of Dynamic Visual Scenes with Foundation Vision Models [2.790870674964473]
We propose Vi-ST, atemporal convolutional neural network fed with a self-supervised Vision Transformer (ViT)
Our proposed Vi-ST demonstrates a novel modeling framework for neuronal coding of dynamic visual scenes in the brain.
arXiv Detail & Related papers (2024-07-15T14:06:13Z) - Animate Your Thoughts: Decoupled Reconstruction of Dynamic Natural Vision from Slow Brain Activity [13.04953215936574]
We propose a two-stage model named Mind-Animator to reconstruct human dynamic vision from brain activity.
During the fMRI-to-feature stage, we decouple semantic, structure, and motion features from fMRI.
In the feature-to-video stage, these features are integrated into videos using an inflated Stable Diffusion.
arXiv Detail & Related papers (2024-05-06T08:56:41Z) - DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors [63.43133768897087]
We propose a method to convert open-domain images into animated videos.
The key idea is to utilize the motion prior to text-to-video diffusion models by incorporating the image into the generative process as guidance.
Our proposed method can produce visually convincing and more logical & natural motions, as well as higher conformity to the input image.
arXiv Detail & Related papers (2023-10-18T14:42:16Z) - Cinematic Mindscapes: High-quality Video Reconstruction from Brain
Activity [0.0]
We show that Mind-Video can reconstruct high-quality videos of arbitrary frame rates using adversarial guidance.
We also show that our model is biologically plausible and interpretable, reflecting established physiological processes.
arXiv Detail & Related papers (2023-05-19T13:44:25Z) - Brain Captioning: Decoding human brain activity into images and text [1.5486926490986461]
We present an innovative method for decoding brain activity into meaningful images and captions.
Our approach takes advantage of cutting-edge image captioning models and incorporates a unique image reconstruction pipeline.
We evaluate our methods using quantitative metrics for both generated captions and images.
arXiv Detail & Related papers (2023-05-19T09:57:19Z) - Modelling Human Visual Motion Processing with Trainable Motion Energy
Sensing and a Self-attention Network [1.9458156037869137]
We propose an image-computable model of human motion perception by bridging the gap between biological and computer vision models.
This model architecture aims to capture the computations in V1-MT, the core structure for motion perception in the biological visual system.
In silico neurophysiology reveals that our model's unit responses are similar to mammalian neural recordings regarding motion pooling and speed tuning.
arXiv Detail & Related papers (2023-05-16T04:16:07Z) - Joint fMRI Decoding and Encoding with Latent Embedding Alignment [77.66508125297754]
We introduce a unified framework that addresses both fMRI decoding and encoding.
Our model concurrently recovers visual stimuli from fMRI signals and predicts brain activity from images within a unified framework.
arXiv Detail & Related papers (2023-03-26T14:14:58Z) - High-Fidelity Neural Human Motion Transfer from Monocular Video [71.75576402562247]
Video-based human motion transfer creates video animations of humans following a source motion.
We present a new framework which performs high-fidelity and temporally-consistent human motion transfer with natural pose-dependent non-rigid deformations.
In the experimental results, we significantly outperform the state-of-the-art in terms of video realism.
arXiv Detail & Related papers (2020-12-20T16:54:38Z) - Neural Radiance Flow for 4D View Synthesis and Video Processing [59.9116932930108]
We present a method to learn a 4D spatial-temporal representation of a dynamic scene from a set of RGB images.
Key to our approach is the use of a neural implicit representation that learns to capture the 3D occupancy, radiance, and dynamics of the scene.
arXiv Detail & Related papers (2020-12-17T17:54:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.