Spec2VolCAMU-Net: A Spectrogram-to-Volume Model for EEG-to-fMRI Reconstruction based on Multi-directional Time-Frequency Convolutional Attention Encoder and Vision-Mamba U-Net
- URL: http://arxiv.org/abs/2505.09521v2
- Date: Wed, 10 Sep 2025 21:09:15 GMT
- Title: Spec2VolCAMU-Net: A Spectrogram-to-Volume Model for EEG-to-fMRI Reconstruction based on Multi-directional Time-Frequency Convolutional Attention Encoder and Vision-Mamba U-Net
- Authors: Dongyi He, Shiyang Li, Bin Jiang, He Yan,
- Abstract summary: High-resolution functional magnetic resonance imaging (fMRI) is essential for mapping human brain activity.<n>Existing EEG-to-fMRI generators rely on plain Convolutional Networks (CNNs) that fail to capture cross-channel time-frequency cues.<n>We propose Spec2VolCAMU-Net, a lightweight architecture featuring a Multi-directional Time-Frequency Convolutional Attention for rich feature extraction.
- Score: 13.510069069207548
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: High-resolution functional magnetic resonance imaging (fMRI) is essential for mapping human brain activity; however, it remains costly and logistically challenging. If comparable volumes could be generated directly from widely available scalp electroencephalography (EEG), advanced neuroimaging would become significantly more accessible. Existing EEG-to-fMRI generators rely on plain Convolutional Neural Networks (CNNs) that fail to capture cross-channel time-frequency cues or on heavy transformer/Generative Adversarial Network (GAN) decoders that strain memory and stability. To address these limitations, we propose Spec2VolCAMU-Net, a lightweight architecture featuring a Multi-directional Time-Frequency Convolutional Attention Encoder for rich feature extraction and a Vision-Mamba U-Net decoder that uses linear-time state-space blocks for efficient long-range spatial modelling. We frame the goal of this work as establishing a new state of the art in the spatial fidelity of single-volume reconstruction, a foundational prerequisite for the ultimate aim of generating temporally coherent fMRI time series. Trained end-to-end with a hybrid SSI-MSE loss, Spec2VolCAMU-Net achieves state-of-the-art fidelity on three public benchmarks, recording Structural Similarity Index (SSIM) of 0.693 on NODDI, 0.725 on Oddball and 0.788 on CN-EPFL, representing improvements of 14.5%, 14.9%, and 16.9% respectively over previous best SSIM scores. Furthermore, it achieves competitive Signal-to-Noise Ratio (PSNR) scores, particularly excelling on the CN-EPFL dataset with a 4.6% improvement over the previous best PSNR, thus striking a better balance in reconstruction quality. The proposed model is lightweight and efficient, making it suitable for real-time applications in clinical and research settings. The code is available at https://github.com/hdy6438/Spec2VolCAMU-Net.
Related papers
- Hardware-accelerated graph neural networks: an alternative approach for neuromorphic event-based audio classification and keyword spotting on SoC FPGA [12.261656409413753]
FPGA implementation of event-graph neural networks for audio processing.<n>System achieves up to 95% word-end detection accuracy, with only 10.53 microsecond latency and 1.18 W power consumption.
arXiv Detail & Related papers (2026-02-18T13:26:22Z) - Subtractive Modulative Network with Learnable Periodic Activations [59.89799070130572]
We propose a novel, parameter-efficient Implicit Neural Representation architecture inspired by classical subtractive synthesis.<n>Our SMN achieves a PSNR of $40+$ dB on two image datasets, comparing favorably against state-of-the-art methods in terms of both reconstruction accuracy and parameter efficiency.
arXiv Detail & Related papers (2026-02-18T10:20:50Z) - Neuromorphic Eye Tracking for Low-Latency Pupil Detection [37.091037454305024]
Eye tracking for wearable systems demands low latency and milliwatt-level power.<n>Neuromorphic sensors and spiking neural networks (SNNs) offer a promising alternative.<n>This paper presents a neuromorphic version of top-performing event-based eye-tracking models.
arXiv Detail & Related papers (2025-12-10T11:30:21Z) - DNN-Based Precoding in RIS-Aided mmWave MIMO Systems With Practical Phase Shift [56.04579258267126]
This paper investigates maximizing the throughput of millimeter wave (mmWave) multiple-input multiple-output (MIMO) systems with obstructed direct communication paths.<n>A reconfigurable intelligent surface (RIS) is employed to enhance transmissions, considering mmWave characteristics related to line-of-sight (LoS) and multipath effects.<n>Deep neural network (DNN) is developed to facilitate faster codeword selection.
arXiv Detail & Related papers (2025-07-03T17:35:06Z) - Neuromorphic Wireless Split Computing with Resonate-and-Fire Neurons [69.73249913506042]
This paper investigates a wireless split computing architecture that employs resonate-and-fire (RF) neurons to process time-domain signals directly.<n>By resonating at tunable frequencies, RF neurons extract time-localized spectral features while maintaining low spiking activity.<n> Experimental results show that the proposed RF-SNN architecture achieves comparable accuracy to conventional LIF-SNNs and ANNs.
arXiv Detail & Related papers (2025-06-24T21:14:59Z) - I Can't Believe It's Not Real: CV-MuSeNet: Complex-Valued Multi-Signal Segmentation [2.8057339957917673]
Cognitive radio systems enable dynamic spectrum access with the aid of recent innovations in neural networks.<n>Traditional real-valued neural networks (RVNNs) face difficulties in low signal-to-noise ratio (SNR) environments.<n>This work presents CMuSeNet, a complex-valued multi-signal segmentation network for wideband spectrum sensing.
arXiv Detail & Related papers (2025-05-21T20:08:02Z) - From Brainwaves to Brain Scans: A Robust Neural Network for EEG-to-fMRI Synthesis [4.710921988115686]
We propose E2fNet, a simple yet effective deep learning model for synthesizing fMRI images from low-cost EEG data.<n>E2fNet is an encoder-decoder network specifically designed to capture and translate meaningful multi-scale features from EEG across electrode channels into accurate fMRI representations.
arXiv Detail & Related papers (2025-02-11T23:55:16Z) - Neuromorphic Wireless Split Computing with Multi-Level Spikes [69.73249913506042]
Neuromorphic computing uses spiking neural networks (SNNs) to perform inference tasks.<n> embedding a small payload within each spike exchanged between spiking neurons can enhance inference accuracy without increasing energy consumption.<n> split computing - where an SNN is partitioned across two devices - is a promising solution.<n>This paper presents the first comprehensive study of a neuromorphic wireless split computing architecture that employs multi-level SNNs.
arXiv Detail & Related papers (2024-11-07T14:08:35Z) - KFD-NeRF: Rethinking Dynamic NeRF with Kalman Filter [49.85369344101118]
We introduce KFD-NeRF, a novel dynamic neural radiance field integrated with an efficient and high-quality motion reconstruction framework based on Kalman filtering.
Our key idea is to model the dynamic radiance field as a dynamic system whose temporally varying states are estimated based on two sources of knowledge: observations and predictions.
Our KFD-NeRF demonstrates similar or even superior performance within comparable computational time and state-of-the-art view synthesis performance with thorough training.
arXiv Detail & Related papers (2024-07-18T05:48:24Z) - Gated Attention Coding for Training High-performance and Efficient Spiking Neural Networks [22.66931446718083]
Gated Attention Coding (GAC) is a plug-and-play module that leverages the multi-dimensional attention unit to efficiently encode inputs into powerful representations.
GAC functions as a preprocessing layer that does not disrupt the spike-driven nature of the SNN.
Experiments on CIFAR10/100 and ImageNet datasets demonstrate that GAC achieves state-of-the-art accuracy with remarkable efficiency.
arXiv Detail & Related papers (2023-08-12T14:42:02Z) - MF-NeRF: Memory Efficient NeRF with Mixed-Feature Hash Table [62.164549651134465]
We propose MF-NeRF, a memory-efficient NeRF framework that employs a Mixed-Feature hash table to improve memory efficiency and reduce training time while maintaining reconstruction quality.
Our experiments with state-of-the-art Instant-NGP, TensoRF, and DVGO, indicate our MF-NeRF could achieve the fastest training time on the same GPU hardware with similar or even higher reconstruction quality.
arXiv Detail & Related papers (2023-04-25T05:44:50Z) - Bayesian Neural Network Language Modeling for Speech Recognition [59.681758762712754]
State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex.
In this paper, an overarching full Bayesian learning framework is proposed to account for the underlying uncertainty in LSTM-RNN and Transformer LMs.
arXiv Detail & Related papers (2022-08-28T17:50:19Z) - NeuralDPS: Neural Deterministic Plus Stochastic Model with Multiband
Excitation for Noise-Controllable Waveform Generation [67.96138567288197]
We propose a novel neural vocoder named NeuralDPS which can retain high speech quality and acquire high synthesis efficiency and noise controllability.
It generates waveforms at least 280 times faster than the WaveNet vocoder.
It is also 28% faster than WaveGAN's synthesis efficiency on a single core.
arXiv Detail & Related papers (2022-03-05T08:15:29Z) - A New Backbone for Hyperspectral Image Reconstruction [90.48427561874402]
3D hyperspectral image (HSI) reconstruction refers to inverse process of snapshot compressive imaging.
Proposal is for a Spatial/Spectral Invariant Residual U-Net, namely SSI-ResU-Net.
We show that SSI-ResU-Net achieves competing performance with over 77.3% reduction in terms of floating-point operations.
arXiv Detail & Related papers (2021-08-17T16:20:51Z) - HYPER-SNN: Towards Energy-efficient Quantized Deep Spiking Neural
Networks for Hyperspectral Image Classification [5.094623170336122]
Spiking Neural Networks (SNNs) are trained with quantization-aware gradient descent to optimize weights, membrane leak, and firing thresholds.
During both training and inference, the analog pixel values of a HSI are directly applied to the input layer of the SNN without the need to convert to a spike-train.
We evaluate our proposal using three HSI datasets on a 3-D and a 3-D/2-D hybrid convolutional architecture.
arXiv Detail & Related papers (2021-07-26T06:17:10Z) - Raw Waveform Encoder with Multi-Scale Globally Attentive Locally
Recurrent Networks for End-to-End Speech Recognition [45.858039215825656]
We propose a new encoder that adopts globally attentive locally recurrent (GALR) networks and directly takes raw waveform as input.
Experiments are conducted on a benchmark dataset AISHELL-2 and two large-scale Mandarin speech corpus of 5,000 hours and 21,000 hours.
arXiv Detail & Related papers (2021-06-08T12:12:33Z) - ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked
Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware.
The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation.
We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z) - Learning Frequency-aware Dynamic Network for Efficient Super-Resolution [56.98668484450857]
This paper explores a novel frequency-aware dynamic network for dividing the input into multiple parts according to its coefficients in the discrete cosine transform (DCT) domain.
In practice, the high-frequency part will be processed using expensive operations and the lower-frequency part is assigned with cheap operations to relieve the computation burden.
Experiments conducted on benchmark SISR models and datasets show that the frequency-aware dynamic network can be employed for various SISR neural architectures.
arXiv Detail & Related papers (2021-03-15T12:54:26Z) - Towards a Competitive End-to-End Speech Recognition for CHiME-6 Dinner
Party Transcription [73.66530509749305]
In this paper, we argue that, even in difficult cases, some end-to-end approaches show performance close to the hybrid baseline.
We experimentally compare and analyze CTC-Attention versus RNN-Transducer approaches along with RNN versus Transformer architectures.
Our best end-to-end model based on RNN-Transducer, together with improved beam search, reaches quality by only 3.8% WER abs. worse than the LF-MMI TDNN-F CHiME-6 Challenge baseline.
arXiv Detail & Related papers (2020-04-22T19:08:33Z) - WaveCRN: An Efficient Convolutional Recurrent Neural Network for
End-to-end Speech Enhancement [31.236720440495994]
In this paper, we propose an efficient E2E SE model, termed WaveCRN.
In WaveCRN, the speech locality feature is captured by a convolutional neural network (CNN), while the temporal sequential property of the locality feature is modeled by stacked simple recurrent units (SRU)
In addition, in order to more effectively suppress the noise components in the input noisy speech, we derive a novel restricted feature masking (RFM) approach that performs enhancement on the feature maps in the hidden layers.
arXiv Detail & Related papers (2020-04-06T13:48:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.