Improving visual image reconstruction from human brain activity using
latent diffusion models via multiple decoded inputs
- URL: http://arxiv.org/abs/2306.11536v1
- Date: Tue, 20 Jun 2023 13:48:02 GMT
- Title: Improving visual image reconstruction from human brain activity using
latent diffusion models via multiple decoded inputs
- Authors: Yu Takagi, Shinji Nishimoto
- Abstract summary: Integration of deep learning and neuroscience has led to improvements in the analysis of brain activity.
The reconstruction of visual experience from human brain activity is an area that has particularly benefited.
We examine the extent to which various additional decoding techniques affect the performance of visual experience reconstruction.
- Score: 2.4366811507669124
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The integration of deep learning and neuroscience has been advancing rapidly,
which has led to improvements in the analysis of brain activity and the
understanding of deep learning models from a neuroscientific perspective. The
reconstruction of visual experience from human brain activity is an area that
has particularly benefited: the use of deep learning models trained on large
amounts of natural images has greatly improved its quality, and approaches that
combine the diverse information contained in visual experiences have
proliferated rapidly in recent years. In this technical paper, by taking
advantage of the simple and generic framework that we proposed (Takagi and
Nishimoto, CVPR 2023), we examine the extent to which various additional
decoding techniques affect the performance of visual experience reconstruction.
Specifically, we combined our earlier work with the following three techniques:
using decoded text from brain activity, nonlinear optimization for structural
image reconstruction, and using decoded depth information from brain activity.
We confirmed that these techniques contributed to improving accuracy over the
baseline. We also discuss what researchers should consider when performing
visual reconstruction using deep generative models trained on large datasets.
Please check our webpage at
https://sites.google.com/view/stablediffusion-with-brain/. Code is also
available at https://github.com/yu-takagi/StableDiffusionReconstruction.
Related papers
- Decoding Visual Experience and Mapping Semantics through Whole-Brain Analysis Using fMRI Foundation Models [10.615012396285337]
We develop algorithms to enhance our understanding of visual processes by incorporating whole-brain activation maps.
We first compare our method with state-of-the-art approaches to decoding visual processing and show improved predictive semantic accuracy by 43%.
arXiv Detail & Related papers (2024-11-11T16:51:17Z) - Knowledge-Guided Prompt Learning for Lifespan Brain MR Image Segmentation [53.70131202548981]
We present a two-step segmentation framework employing Knowledge-Guided Prompt Learning (KGPL) for brain MRI.
Specifically, we first pre-train segmentation models on large-scale datasets with sub-optimal labels.
The introduction of knowledge-wise prompts captures semantic relationships between anatomical variability and biological processes.
arXiv Detail & Related papers (2024-07-31T04:32:43Z) - MindBridge: A Cross-Subject Brain Decoding Framework [60.58552697067837]
Brain decoding aims to reconstruct stimuli from acquired brain signals.
Currently, brain decoding is confined to a per-subject-per-model paradigm.
We present MindBridge, that achieves cross-subject brain decoding by employing only one model.
arXiv Detail & Related papers (2024-04-11T15:46:42Z) - Brain-optimized inference improves reconstructions of fMRI brain
activity [0.0]
We evaluate the prospect of further improving recent decoding methods by optimizing for consistency between reconstructions and brain activity during inference.
We sample seed reconstructions from a base decoding method, then iteratively refine these reconstructions using a brain-optimized encoding model.
We show that reconstruction quality can be significantly improved by explicitly aligning decoding to brain activity distributions.
arXiv Detail & Related papers (2023-12-12T20:08:59Z) - UniBrain: Unify Image Reconstruction and Captioning All in One Diffusion
Model from Human Brain Activity [2.666777614876322]
We propose UniBrain: Unify Image Reconstruction and Captioning All in One Diffusion Model from Human Brain Activity.
We transform fMRI voxels into text and image latent for low-level information to generate realistic captions and images.
UniBrain outperforms current methods both qualitatively and quantitatively in terms of image reconstruction and reports image captioning results for the first time on the Natural Scenes dataset.
arXiv Detail & Related papers (2023-08-14T19:49:29Z) - Seeing through the Brain: Image Reconstruction of Visual Perception from
Human Brain Signals [27.92796103924193]
We propose a comprehensive pipeline, named NeuroImagen, for reconstructing visual stimuli images from EEG signals.
We incorporate a novel multi-level perceptual information decoding to draw multi-grained outputs from the given EEG data.
arXiv Detail & Related papers (2023-07-27T12:54:16Z) - Brain Captioning: Decoding human brain activity into images and text [1.5486926490986461]
We present an innovative method for decoding brain activity into meaningful images and captions.
Our approach takes advantage of cutting-edge image captioning models and incorporates a unique image reconstruction pipeline.
We evaluate our methods using quantitative metrics for both generated captions and images.
arXiv Detail & Related papers (2023-05-19T09:57:19Z) - Compositional Scene Representation Learning via Reconstruction: A Survey [48.33349317481124]
Compositional scene representation learning is a task that enables such abilities.
Deep neural networks have been proven to be advantageous in representation learning.
Learning via reconstruction is advantageous because it may utilize massive unlabeled data and avoid costly and laborious data annotation.
arXiv Detail & Related papers (2022-02-15T02:14:05Z) - Is Deep Image Prior in Need of a Good Education? [57.3399060347311]
Deep image prior was introduced as an effective prior for image reconstruction.
Despite its impressive reconstructive properties, the approach is slow when compared to learned or traditional reconstruction techniques.
We develop a two-stage learning paradigm to address the computational challenge.
arXiv Detail & Related papers (2021-11-23T15:08:26Z) - Neural Fields in Visual Computing and Beyond [54.950885364735804]
Recent advances in machine learning have created increasing interest in solving visual computing problems using coordinate-based neural networks.
neural fields have seen successful application in the synthesis of 3D shapes and image, animation of human bodies, 3D reconstruction, and pose estimation.
This report provides context, mathematical grounding, and an extensive review of literature on neural fields.
arXiv Detail & Related papers (2021-11-22T18:57:51Z) - NAS-DIP: Learning Deep Image Prior with Neural Architecture Search [65.79109790446257]
Recent work has shown that the structure of deep convolutional neural networks can be used as a structured image prior.
We propose to search for neural architectures that capture stronger image priors.
We search for an improved network by leveraging an existing neural architecture search algorithm.
arXiv Detail & Related papers (2020-08-26T17:59:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.