MEBM-Speech: Multi-scale Enhanced BrainMagic for Robust MEG Speech Detection
- URL: http://arxiv.org/abs/2603.02255v1
- Date: Fri, 27 Feb 2026 13:15:42 GMT
- Title: MEBM-Speech: Multi-scale Enhanced BrainMagic for Robust MEG Speech Detection
- Authors: Li Songyi, Zheng Linze, Liang Jinghua, Zhang Zifeng,
- Abstract summary: We propose MEBM-Speech, a neural decoder for speech activity detection from magnetoencephalography (MEG) signals.<n>Built upon the BrainMagic backbone, MEBM-Speech integrates three complementary temporal modeling mechanisms.<n>The model performs continuous probabilistic decoding of MEG signals, enabling fine-grained detection of speech versus silence states.
- Score: 0.27998963147546146
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose MEBM-Speech, a multi-scale enhanced neural decoder for speech activity detection from non-invasive magnetoencephalography (MEG) signals. Built upon the BrainMagic backbone, MEBM-Speech integrates three complementary temporal modeling mechanisms: a multi-scale convolutional module for short-term pattern extraction, a bidirectional LSTM (BiLSTM) for long-range context modeling, and a depthwise separable convolutional layer for efficient cross-scale feature fusion. A lightweight temporal jittering strategy and average pooling further improve onset robustness and boundary stability. The model performs continuous probabilistic decoding of MEG signals, enabling fine-grained detection of speech versus silence states - an ability crucial for both cognitive neuroscience and clinical applications. Comprehensive evaluations on the LibriBrain Competition 2025 Track1 benchmark demonstrate strong performance, achieving an average F1 macro of 89.3% on the validation set and comparable results on the official test leaderboard. These findings highlight the effectiveness of multi-scale temporal representation learning for robust MEG-based speech decoding.
Related papers
- MEBM-Phoneme: Multi-scale Enhanced BrainMagic for End-to-End MEG Phoneme Classification [0.27998963147546146]
MEBM-Phoneme is a neural decoder for phoneme classification from non-invasive magnetoencephalography (MEG) signals.<n>Built upon the BrainMagic backbone, MEBM-Phoneme integrates a short-term convolutional module to augment the native mid-term encoder.<n> Comprehensive on LibriBrain Competition 2025 Track2 demonstrate robust generalization, achieving competitive phoneme decoding accuracy.
arXiv Detail & Related papers (2026-02-27T13:02:33Z) - Multimodal Visual Surrogate Compression for Alzheimer's Disease Classification [69.87877580725768]
Multimodal Visual Surrogate Compression (MVSC) learns to compress and adapt large 3D sMRI volumes into compact 2D features.<n>MVSC has two key components: a Volume Context that captures global cross-slice context under textual guidance, and an Adaptive Slice Fusion module that aggregates slice-level information in a text-enhanced, patch-wise manner.
arXiv Detail & Related papers (2026-01-29T13:05:46Z) - Digital FAST: An AI-Driven Multimodal Framework for Rapid and Early Stroke Screening [0.7136933021609076]
This study presents a fast, non-invasive multimodal deep learning framework for automatic binary stroke screening based on data collected during the F.A.S.T. assessment.<n>The proposed approach integrates complementary information from facial expressions, speech signals, and upper-body movements to enhance diagnostic robustness.
arXiv Detail & Related papers (2026-01-17T03:35:39Z) - DRBD-Mamba for Robust and Efficient Brain Tumor Segmentation with Analytical Insights [54.87947751720332]
Accurate brain tumor segmentation is significant for clinical diagnosis and treatment.<n>Mamba-based State Space Models have demonstrated promising performance.<n>We propose a dual-resolution bi-directional Mamba that captures multi-scale long-range dependencies with minimal computational overhead.
arXiv Detail & Related papers (2025-10-16T07:31:21Z) - NeuroRVQ: Multi-Scale EEG Tokenization for Generative Large Brainwave Models [66.91449452840318]
We introduce NeuroRVQ, a scalable Large Brainwave Model (LBM) centered on a codebook-based tokenizer.<n>Our tokenizer integrates: (i) multi-scale feature extraction modules that capture the full frequency neural spectrum; (ii) hierarchical residual vector quantization (RVQ) codebooks for high-resolution encoding; and, (iii) an EEG signal phase- and amplitude-aware loss function for efficient training.<n>Our empirical results demonstrate that NeuroRVQ achieves lower reconstruction error and outperforms existing LBMs on a variety of downstream tasks.
arXiv Detail & Related papers (2025-10-15T01:26:52Z) - From Promise to Practical Reality: Transforming Diffusion MRI Analysis with Fast Deep Learning Enhancement [35.368152968098194]
FastFOD-Net is an end-to-end deep learning framework enhancing FODs with superior performance and delivering training/inference efficiency for clinical use.<n>This work will facilitate the more widespread adoption of, and build clinical trust in, deep learning based methods for diffusion MRI enhancement.
arXiv Detail & Related papers (2025-08-13T17:56:29Z) - RL-U$^2$Net: A Dual-Branch UNet with Reinforcement Learning-Assisted Multimodal Feature Fusion for Accurate 3D Whole-Heart Segmentation [0.624829068285122]
We propose a dual-branch U-Net architecture enhanced by reinforcement learning for feature alignment.<n>The model employs a dual-branch U-shaped network to process CT and MRI patches in parallel, and introduces a novel RL-XAlign module.<n> Experimental results on the publicly available MM-WHS 2017 dataset demonstrate that the proposed RL-U$2$Net outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2025-08-04T16:12:06Z) - Multimodal Fusion at Three Tiers: Physics-Driven Data Generation and Vision-Language Guidance for Brain Tumor Segmentation [8.695435245976482]
This paper proposes a three-tier fusion architecture that achieves precise brain tumor segmentation.<n>The method processes information progressively at the pixel, feature, and semantic levels.<n>We validated the method on the Brain Tumor (BraTS) 2020, 2021 and 2023 datasets.
arXiv Detail & Related papers (2025-07-14T06:32:59Z) - Hierarchical Deep Feature Fusion and Ensemble Learning for Enhanced Brain Tumor MRI Classification [3.776159955137874]
The framework incorporates comprehensive preprocessing and data augmentation of brain magnetic resonance images (MRI)<n>The novelty lies in the dual-level ensembling strategy: feature-level ensembling, and classifier-level ensembling.<n> Experiments on two public Kaggle MRI brain tumor datasets demonstrate that this approach significantly surpasses state-of-the-art methods.
arXiv Detail & Related papers (2025-06-14T05:53:54Z) - CodeBrain: Towards Decoupled Interpretability and Multi-Scale Architecture for EEG Foundation Model [52.466542039411515]
EEG foundation models (EFMs) have emerged to address the scalability issues of task-specific models.<n>We present CodeBrain, a two-stage EFM designed to fill this gap.<n>In the first stage, we introduce the TFDual-Tokenizer, which decouples heterogeneous temporal and frequency EEG signals into discrete tokens.<n>In the second stage, we propose the multi-scale EEGSSM architecture, which combines structured global convolution with sliding window attention.
arXiv Detail & Related papers (2025-06-10T17:20:39Z) - CEReBrO: Compact Encoder for Representations of Brain Oscillations Using Efficient Alternating Attention [46.47343031985037]
We introduce a Compact for Representations of Brain Oscillations using alternating attention (CEReBrO)<n>Our tokenization scheme represents EEG signals at a per-channel patch.<n>We propose an alternating attention mechanism that jointly models intra-channel temporal dynamics and inter-channel spatial correlations, achieving 2x speed improvement with 6x less memory required compared to standard self-attention.
arXiv Detail & Related papers (2025-01-18T21:44:38Z) - Unveiling Incomplete Modality Brain Tumor Segmentation: Leveraging Masked Predicted Auto-Encoder and Divergence Learning [6.44069573245889]
Brain tumor segmentation remains a significant challenge, particularly in the context of multi-modal magnetic resonance imaging (MRI)
We propose a novel strategy, which is called masked predicted pre-training, enabling robust feature learning from incomplete modality data.
In the fine-tuning phase, we utilize a knowledge distillation technique to align features between complete and missing modality data, simultaneously enhancing model robustness.
arXiv Detail & Related papers (2024-06-12T20:35:16Z) - Cross-modality Guidance-aided Multi-modal Learning with Dual Attention
for MRI Brain Tumor Grading [47.50733518140625]
Brain tumor represents one of the most fatal cancers around the world, and is very common in children and the elderly.
We propose a novel cross-modality guidance-aided multi-modal learning with dual attention for addressing the task of MRI brain tumor grading.
arXiv Detail & Related papers (2024-01-17T07:54:49Z) - Cross-Modality Deep Feature Learning for Brain Tumor Segmentation [158.8192041981564]
This paper proposes a novel cross-modality deep feature learning framework to segment brain tumors from the multi-modality MRI data.
The core idea is to mine rich patterns across the multi-modality data to make up for the insufficient data scale.
Comprehensive experiments are conducted on the BraTS benchmarks, which show that the proposed cross-modality deep feature learning framework can effectively improve the brain tumor segmentation performance.
arXiv Detail & Related papers (2022-01-07T07:46:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.