Spec2VolCAMU-Net: A Spectrogram-to-Volume Model for EEG-to-fMRI Reconstruction based on Multi-directional Time-Frequency Convolutional Attention Encoder and Vision-Mamba U-Net
- URL: http://arxiv.org/abs/2505.09521v1
- Date: Wed, 14 May 2025 16:18:21 GMT
- Title: Spec2VolCAMU-Net: A Spectrogram-to-Volume Model for EEG-to-fMRI Reconstruction based on Multi-directional Time-Frequency Convolutional Attention Encoder and Vision-Mamba U-Net
- Authors: Dongyi He, Shiyang Li, Bin Jiang, He Yan,
- Abstract summary: High-resolution functional magnetic resonance imaging (fMRI) is essential for mapping human brain activity.<n>Existing EEG-to-fMRI generators rely on plain CNNs that fail to capture cross-channel time-frequency cues.<n>We propose Spec2VolCAMU-Net, a lightweight spectrogram-to-volume generator that confronts these issues via a Time-Frequency Convolutional Attention.
- Score: 12.862801292260833
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: High-resolution functional magnetic resonance imaging (fMRI) is essential for mapping human brain activity; however, it remains costly and logistically challenging. If comparable volumes could be generated directly from widely available scalp electroencephalography (EEG), advanced neuroimaging would become significantly more accessible. Existing EEG-to-fMRI generators rely on plain CNNs that fail to capture cross-channel time-frequency cues or on heavy transformer/GAN decoders that strain memory and stability. We propose Spec2VolCAMU-Net, a lightweight spectrogram-to-volume generator that confronts these issues via a Multi-directional Time-Frequency Convolutional Attention Encoder, stacking temporal, spectral and joint convolutions with self-attention, and a Vision-Mamba U-Net decoder whose linear-time state-space blocks enable efficient long-range spatial modelling. Trained end-to-end with a hybrid SSI-MSE loss, Spec2VolCAMU-Net achieves state-of-the-art fidelity on three public benchmarks, recording SSIMs of 0.693 on NODDI, 0.725 on Oddball and 0.788 on CN-EPFL, representing improvements of 14.5%, 14.9%, and 16.9% respectively over previous best SSIM scores. Furthermore, it achieves competitive PSNR scores, particularly excelling on the CN-EPFL dataset with a 4.6% improvement over the previous best PSNR, thus striking a better balance in reconstruction quality. The proposed model is lightweight and efficient, making it suitable for real-time applications in clinical and research settings. The code is available at https://github.com/hdy6438/Spec2VolCAMU-Net.
Related papers
- DNN-Based Precoding in RIS-Aided mmWave MIMO Systems With Practical Phase Shift [56.04579258267126]
This paper investigates maximizing the throughput of millimeter wave (mmWave) multiple-input multiple-output (MIMO) systems with obstructed direct communication paths.<n>A reconfigurable intelligent surface (RIS) is employed to enhance transmissions, considering mmWave characteristics related to line-of-sight (LoS) and multipath effects.<n>Deep neural network (DNN) is developed to facilitate faster codeword selection.
arXiv Detail & Related papers (2025-07-03T17:35:06Z) - Neuromorphic Wireless Split Computing with Resonate-and-Fire Neurons [69.73249913506042]
This paper investigates a wireless split computing architecture that employs resonate-and-fire (RF) neurons to process time-domain signals directly.<n>By resonating at tunable frequencies, RF neurons extract time-localized spectral features while maintaining low spiking activity.<n> Experimental results show that the proposed RF-SNN architecture achieves comparable accuracy to conventional LIF-SNNs and ANNs.
arXiv Detail & Related papers (2025-06-24T21:14:59Z) - I Can't Believe It's Not Real: CV-MuSeNet: Complex-Valued Multi-Signal Segmentation [2.8057339957917673]
Cognitive radio systems enable dynamic spectrum access with the aid of recent innovations in neural networks.<n>Traditional real-valued neural networks (RVNNs) face difficulties in low signal-to-noise ratio (SNR) environments.<n>This work presents CMuSeNet, a complex-valued multi-signal segmentation network for wideband spectrum sensing.
arXiv Detail & Related papers (2025-05-21T20:08:02Z) - From Brainwaves to Brain Scans: A Robust Neural Network for EEG-to-fMRI Synthesis [4.710921988115686]
We propose E2fNet, a simple yet effective deep learning model for synthesizing fMRI images from low-cost EEG data.<n>E2fNet is an encoder-decoder network specifically designed to capture and translate meaningful multi-scale features from EEG across electrode channels into accurate fMRI representations.
arXiv Detail & Related papers (2025-02-11T23:55:16Z) - Neuromorphic Wireless Split Computing with Multi-Level Spikes [69.73249913506042]
Neuromorphic computing uses spiking neural networks (SNNs) to perform inference tasks.<n> embedding a small payload within each spike exchanged between spiking neurons can enhance inference accuracy without increasing energy consumption.<n> split computing - where an SNN is partitioned across two devices - is a promising solution.<n>This paper presents the first comprehensive study of a neuromorphic wireless split computing architecture that employs multi-level SNNs.
arXiv Detail & Related papers (2024-11-07T14:08:35Z) - KFD-NeRF: Rethinking Dynamic NeRF with Kalman Filter [49.85369344101118]
We introduce KFD-NeRF, a novel dynamic neural radiance field integrated with an efficient and high-quality motion reconstruction framework based on Kalman filtering.
Our key idea is to model the dynamic radiance field as a dynamic system whose temporally varying states are estimated based on two sources of knowledge: observations and predictions.
Our KFD-NeRF demonstrates similar or even superior performance within comparable computational time and state-of-the-art view synthesis performance with thorough training.
arXiv Detail & Related papers (2024-07-18T05:48:24Z) - Gated Attention Coding for Training High-performance and Efficient Spiking Neural Networks [22.66931446718083]
Gated Attention Coding (GAC) is a plug-and-play module that leverages the multi-dimensional attention unit to efficiently encode inputs into powerful representations.
GAC functions as a preprocessing layer that does not disrupt the spike-driven nature of the SNN.
Experiments on CIFAR10/100 and ImageNet datasets demonstrate that GAC achieves state-of-the-art accuracy with remarkable efficiency.
arXiv Detail & Related papers (2023-08-12T14:42:02Z) - Bayesian Neural Network Language Modeling for Speech Recognition [59.681758762712754]
State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex.
In this paper, an overarching full Bayesian learning framework is proposed to account for the underlying uncertainty in LSTM-RNN and Transformer LMs.
arXiv Detail & Related papers (2022-08-28T17:50:19Z) - NeuralDPS: Neural Deterministic Plus Stochastic Model with Multiband
Excitation for Noise-Controllable Waveform Generation [67.96138567288197]
We propose a novel neural vocoder named NeuralDPS which can retain high speech quality and acquire high synthesis efficiency and noise controllability.
It generates waveforms at least 280 times faster than the WaveNet vocoder.
It is also 28% faster than WaveGAN's synthesis efficiency on a single core.
arXiv Detail & Related papers (2022-03-05T08:15:29Z) - HYPER-SNN: Towards Energy-efficient Quantized Deep Spiking Neural
Networks for Hyperspectral Image Classification [5.094623170336122]
Spiking Neural Networks (SNNs) are trained with quantization-aware gradient descent to optimize weights, membrane leak, and firing thresholds.
During both training and inference, the analog pixel values of a HSI are directly applied to the input layer of the SNN without the need to convert to a spike-train.
We evaluate our proposal using three HSI datasets on a 3-D and a 3-D/2-D hybrid convolutional architecture.
arXiv Detail & Related papers (2021-07-26T06:17:10Z) - Raw Waveform Encoder with Multi-Scale Globally Attentive Locally
Recurrent Networks for End-to-End Speech Recognition [45.858039215825656]
We propose a new encoder that adopts globally attentive locally recurrent (GALR) networks and directly takes raw waveform as input.
Experiments are conducted on a benchmark dataset AISHELL-2 and two large-scale Mandarin speech corpus of 5,000 hours and 21,000 hours.
arXiv Detail & Related papers (2021-06-08T12:12:33Z) - ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked
Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware.
The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation.
We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z) - Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner
Party Transcription [73.66530509749305]
In this paper, we argue that, even in difficult cases, some end-to-end approaches show performance close to the hybrid baseline.
We experimentally compare and analyze CTC-Attention versus RNN-Transducer approaches along with RNN versus Transformer architectures.
Our best end-to-end model based on RNN-Transducer, together with improved beam search, reaches quality by only 3.8% WER abs. worse than the LF-MMI TDNN-F CHiME-6 Challenge baseline.
arXiv Detail & Related papers (2020-04-22T19:08:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.