Related papers: Double-Flow GAN model for the reconstruction of perceived faces from brain activities

Double-Flow GAN model for the reconstruction of perceived faces from brain activities

URL: http://arxiv.org/abs/2312.07478v1
Date: Tue, 12 Dec 2023 18:07:57 GMT
Title: Double-Flow GAN model for the reconstruction of perceived faces from brain activities
Authors: Zihao Wang, Jing Zhao and Hui Zhang
Abstract summary: We proposed a novel reconstruction framework, which we called Double-Flow GAN. We also designed a pretraining process that uses features extracted from images as conditions for making it possible to pretrain the conditional reconstruction model from fMRI. Our results demonstrated that our method showed significant reconstruction performance, outperformed the previous reconstruction models, and exhibited a good generation ability.
Score: 16.82988438934791
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Face plays an important role in human's visual perception, and reconstructing perceived faces from brain activities is challenging because of its difficulty in extracting high-level features and maintaining consistency of multiple face attributes, such as expression, identity, gender, etc. In this study, we proposed a novel reconstruction framework, which we called Double-Flow GAN, that can enhance the capability of discriminator and handle imbalances in images from certain domains that are too easy for generators. We also designed a pretraining process that uses features extracted from images as conditions for making it possible to pretrain the conditional reconstruction model from fMRI in a larger pure image dataset. Moreover, we developed a simple pretrained model to perform fMRI alignment to alleviate the problem of cross-subject reconstruction due to the variations of brain structure among different subjects. We conducted experiments by using our proposed method and state-of-the-art reconstruction models. Our results demonstrated that our method showed significant reconstruction performance, outperformed the previous reconstruction models, and exhibited a good generation ability.

Related papers

Canonical Pose Reconstruction from Single Depth Image for 3D Non-rigid Pose Recovery on Limited Datasets [55.84702107871358]
3D reconstruction from 2D inputs, especially for non-rigid objects like humans, presents unique challenges.<n>Traditional methods often struggle with non-rigid shapes, which require extensive training data to cover the entire deformation space.<n>This study proposes a canonical pose reconstruction model that transforms single-view depth images of deformable shapes into a canonical form.
arXiv Detail & Related papers (2025-05-23T14:58:34Z)
Towards Prospective Medical Image Reconstruction via Knowledge-Informed Dynamic Optimal Transport [58.6869774515413]
This paper introduces imaging Knowledge-Informed Dynamic Optimal Transport (KIDOT), a novel dynamic optimal transport framework.<n>KIDOT learns from unpaired data by modeling reconstruction as a continuous evolution path from measurements to images, guided by an imaging knowledge-informed cost function and transport equation.<n>Experiments on MRI and CT reconstruction demonstrate KIDOT's superior performance.
arXiv Detail & Related papers (2025-05-23T09:05:10Z)
Implicit neural representations for end-to-end PET reconstruction [3.7066816275267627]
Implicit neural representations (INRs) have demonstrated strong capabilities in various medical imaging tasks. We propose an unsupervised PET image reconstruction method based on the implicit SIREN neural network architecture. Our method incorporates a forward projection model and a loss function adapted to perform PET image reconstruction directly from sinograms.
arXiv Detail & Related papers (2025-03-26T08:30:53Z)
LHM: Large Animatable Human Reconstruction Model from a Single Image in Seconds [21.99354901986186]
We propose LHM (Large Animatable Human Reconstruction Model) to infer high-fidelity avatars represented as 3D Gaussian splatting in a feed-forward pass. Our model leverages a multimodal transformer architecture to effectively encode the human body positional features and image features with attention mechanism. Our LHM generates plausible animatable human in seconds without post-processing for face and hands, outperforming existing methods in both reconstruction accuracy and generalization ability.
arXiv Detail & Related papers (2025-03-13T17:59:21Z)
One-step Generative Diffusion for Realistic Extreme Image Rescaling [47.89362819768323]
We propose a novel framework called One-Step Image Rescaling Diffusion (OSIRDiff) for extreme image rescaling. OSIRDiff performs rescaling operations in the latent space of a pre-trained autoencoder. It effectively leverages powerful natural image priors learned by a pre-trained text-to-image diffusion model.
arXiv Detail & Related papers (2024-08-17T09:51:42Z)
MindBridge: A Cross-Subject Brain Decoding Framework [60.58552697067837]
Brain decoding aims to reconstruct stimuli from acquired brain signals. Currently, brain decoding is confined to a per-subject-per-model paradigm. We present MindBridge, that achieves cross-subject brain decoding by employing only one model.
arXiv Detail & Related papers (2024-04-11T15:46:42Z)
Enhancing Low-dose CT Image Reconstruction by Integrating Supervised and Unsupervised Learning [13.17680480211064]
We propose a hybrid supervised-unsupervised learning framework for X-ray computed tomography (CT) image reconstruction. Each proposed trained block consists of a deterministic MBIR solver and a neural network. We demonstrate the efficacy of this learned hybrid model for low-dose CT image reconstruction with limited training data.
arXiv Detail & Related papers (2023-11-19T20:23:59Z)
Steerable Conditional Diffusion for Out-of-Distribution Adaptation in Medical Image Reconstruction [75.91471250967703]
We introduce a novel sampling framework called Steerable Conditional Diffusion. This framework adapts the diffusion model, concurrently with image reconstruction, based solely on the information provided by the available measurement. We achieve substantial enhancements in out-of-distribution performance across diverse imaging modalities.
arXiv Detail & Related papers (2023-08-28T08:47:06Z)
Diffusion Models for Image Restoration and Enhancement -- A Comprehensive Survey [96.99328714941657]
We present a comprehensive review of recent diffusion model-based methods on image restoration. We classify and emphasize the innovative designs using diffusion models for both IR and blind/real-world IR. We propose five potential and challenging directions for the future research of diffusion model-based IR.
arXiv Detail & Related papers (2023-08-18T08:40:38Z)
MindDiffuser: Controlled Image Reconstruction from Human Brain Activity with Semantic and Structural Diffusion [7.597218661195779]
We propose a two-stage image reconstruction model called MindDiffuser. In Stage 1, the VQ-VAE latent representations and the CLIP text embeddings decoded from fMRI are put into Stable Diffusion. In Stage 2, we utilize the CLIP visual feature decoded from fMRI as supervisory information, and continually adjust the two feature vectors decoded in Stage 1 through backpagation to align the structural information.
arXiv Detail & Related papers (2023-08-08T13:28:34Z)
MindDiffuser: Controlled Image Reconstruction from Human Brain Activity with Semantic and Structural Diffusion [8.299415606889024]
We propose a two-stage image reconstruction model called MindDiffuser. In Stage 1, the VQ-VAE latent representations and the CLIP text embeddings decoded from fMRI are put into the image-to-image process of Stable Diffusion. In Stage 2, we utilize the low-level CLIP visual features decoded from fMRI as supervisory information.
arXiv Detail & Related papers (2023-03-24T16:41:42Z)
Natural scene reconstruction from fMRI signals using generative latent diffusion [1.90365714903665]
We present a two-stage scene reconstruction framework called Brain-Diffuser'' In the first stage, we reconstruct images that capture low-level properties and overall layout using a VDVAE (Very Deep Vari Autoencoder) model. In the second stage, we use the image-to-image framework of a latent diffusion model conditioned on predicted multimodal (text and visual) features.
arXiv Detail & Related papers (2023-03-09T15:24:26Z)
Model-Guided Multi-Contrast Deep Unfolding Network for MRI Super-resolution Reconstruction [68.80715727288514]
We show how to unfold an iterative MGDUN algorithm into a novel model-guided deep unfolding network by taking the MRI observation matrix. In this paper, we propose a novel Model-Guided interpretable Deep Unfolding Network (MGDUN) for medical image SR reconstruction.
arXiv Detail & Related papers (2022-09-15T03:58:30Z)
MonoSDF: Exploring Monocular Geometric Cues for Neural Implicit Surface Reconstruction [72.05649682685197]
State-of-the-art neural implicit methods allow for high-quality reconstructions of simple scenes from many input views. This is caused primarily by the inherent ambiguity in the RGB reconstruction loss that does not provide enough constraints. Motivated by recent advances in the area of monocular geometry prediction, we explore the utility these cues provide for improving neural implicit surface reconstruction.
arXiv Detail & Related papers (2022-06-01T17:58:15Z)
NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction [88.02850205432763]
We present a novel neural surface reconstruction method, called NeuS, for reconstructing objects and scenes with high fidelity from 2D image inputs. Existing neural surface reconstruction approaches, such as DVR and IDR, require foreground mask as supervision. We observe that the conventional volume rendering method causes inherent geometric errors for surface reconstruction. We propose a new formulation that is free of bias in the first order of approximation, thus leading to more accurate surface reconstruction even without the mask supervision.
arXiv Detail & Related papers (2021-06-20T12:59:42Z)
BigGAN-based Bayesian reconstruction of natural images from human brain activity [14.038605815510145]
We propose a new GAN-based visual reconstruction method (GAN-BVRM) that includes a classifier to decode categories from fMRI data. GAN-BVRM employs the pre-trained generator of the prevailing BigGAN to generate masses of natural images. Experimental results revealed that GAN-BVRM improves the fidelity and naturalness, that is, the reconstruction is natural and similar to the presented image stimuli.
arXiv Detail & Related papers (2020-03-13T04:32:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.