Related papers: MEBM-Speech: Multi-scale Enhanced BrainMagic for Robust MEG Speech Detection

MEBM-Speech: Multi-scale Enhanced BrainMagic for Robust MEG Speech Detection

URL: http://arxiv.org/abs/2603.02255v1
Date: Fri, 27 Feb 2026 13:15:42 GMT
Title: MEBM-Speech: Multi-scale Enhanced BrainMagic for Robust MEG Speech Detection
Authors: Li Songyi, Zheng Linze, Liang Jinghua, Zhang Zifeng,
Abstract summary: We propose MEBM-Speech, a neural decoder for speech activity detection from magnetoencephalography (MEG) signals.<n>Built upon the BrainMagic backbone, MEBM-Speech integrates three complementary temporal modeling mechanisms.<n>The model performs continuous probabilistic decoding of MEG signals, enabling fine-grained detection of speech versus silence states.
Score: 0.27998963147546146
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose MEBM-Speech, a multi-scale enhanced neural decoder for speech activity detection from non-invasive magnetoencephalography (MEG) signals. Built upon the BrainMagic backbone, MEBM-Speech integrates three complementary temporal modeling mechanisms: a multi-scale convolutional module for short-term pattern extraction, a bidirectional LSTM (BiLSTM) for long-range context modeling, and a depthwise separable convolutional layer for efficient cross-scale feature fusion. A lightweight temporal jittering strategy and average pooling further improve onset robustness and boundary stability. The model performs continuous probabilistic decoding of MEG signals, enabling fine-grained detection of speech versus silence states - an ability crucial for both cognitive neuroscience and clinical applications. Comprehensive evaluations on the LibriBrain Competition 2025 Track1 benchmark demonstrate strong performance, achieving an average F1 macro of 89.3% on the validation set and comparable results on the official test leaderboard. These findings highlight the effectiveness of multi-scale temporal representation learning for robust MEG-based speech decoding.

Related papers

MEBM-Phoneme: Multi-scale Enhanced BrainMagic for End-to-End MEG Phoneme Classification [0.27998963147546146]
MEBM-Phoneme is a neural decoder for phoneme classification from non-invasive magnetoencephalography (MEG) signals.<n>Built upon the BrainMagic backbone, MEBM-Phoneme integrates a short-term convolutional module to augment the native mid-term encoder.<n> Comprehensive on LibriBrain Competition 2025 Track2 demonstrate robust generalization, achieving competitive phoneme decoding accuracy.
arXiv Detail & Related papers (2026-02-27T13:02:33Z)
Multimodal Visual Surrogate Compression for Alzheimer's Disease Classification [69.87877580725768]
Multimodal Visual Surrogate Compression (MVSC) learns to compress and adapt large 3D sMRI volumes into compact 2D features.<n>MVSC has two key components: a Volume Context that captures global cross-slice context under textual guidance, and an Adaptive Slice Fusion module that aggregates slice-level information in a text-enhanced, patch-wise manner.
arXiv Detail & Related papers (2026-01-29T13:05:46Z)
Digital FAST: An AI-Driven Multimodal Framework for Rapid and Early Stroke Screening [0.7136933021609076]
This study presents a fast, non-invasive multimodal deep learning framework for automatic binary stroke screening based on data collected during the F.A.S.T. assessment.<n>The proposed approach integrates complementary information from facial expressions, speech signals, and upper-body movements to enhance diagnostic robustness.
arXiv Detail & Related papers (2026-01-17T03:35:39Z)
DRBD-Mamba for Robust and Efficient Brain Tumor Segmentation with Analytical Insights [54.87947751720332]
Accurate brain tumor segmentation is significant for clinical diagnosis and treatment.<n>Mamba-based State Space Models have demonstrated promising performance.<n>We propose a dual-resolution bi-directional Mamba that captures multi-scale long-range dependencies with minimal computational overhead.
arXiv Detail & Related papers (2025-10-16T07:31:21Z)
NeuroRVQ: Multi-Scale EEG Tokenization for Generative Large Brainwave Models [66.91449452840318]
We introduce NeuroRVQ, a scalable Large Brainwave Model (LBM) centered on a codebook-based tokenizer.<n>Our tokenizer integrates: (i) multi-scale feature extraction modules that capture the full frequency neural spectrum; (ii) hierarchical residual vector quantization (RVQ) codebooks for high-resolution encoding; and, (iii) an EEG signal phase- and amplitude-aware loss function for efficient training.<n>Our empirical results demonstrate that NeuroRVQ achieves lower reconstruction error and outperforms existing LBMs on a variety of downstream tasks.
arXiv Detail & Related papers (2025-10-15T01:26:52Z)
From Promise to Practical Reality: Transforming Diffusion MRI Analysis with Fast Deep Learning Enhancement [35.368152968098194]
FastFOD-Net is an end-to-end deep learning framework enhancing FODs with superior performance and delivering training/inference efficiency for clinical use.<n>This work will facilitate the more widespread adoption of, and build clinical trust in, deep learning based methods for diffusion MRI enhancement.
arXiv Detail & Related papers (2025-08-13T17:56:29Z)
RL-U$^2$Net: A Dual-Branch UNet with Reinforcement Learning-Assisted Multimodal Feature Fusion for Accurate 3D Whole-Heart Segmentation [0.624829068285122]
We propose a dual-branch U-Net architecture enhanced by reinforcement learning for feature alignment.<n>The model employs a dual-branch U-shaped network to process CT and MRI patches in parallel, and introduces a novel RL-XAlign module.<n> Experimental results on the publicly available MM-WHS 2017 dataset demonstrate that the proposed RL-U$2$Net outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2025-08-04T16:12:06Z)
Multimodal Fusion at Three Tiers: Physics-Driven Data Generation and Vision-Language Guidance for Brain Tumor Segmentation [8.695435245976482]
This paper proposes a three-tier fusion architecture that achieves precise brain tumor segmentation.<n>The method processes information progressively at the pixel, feature, and semantic levels.<n>We validated the method on the Brain Tumor (BraTS) 2020, 2021 and 2023 datasets.
arXiv Detail & Related papers (2025-07-14T06:32:59Z)
Hierarchical Deep Feature Fusion and Ensemble Learning for Enhanced Brain Tumor MRI Classification [3.776159955137874]
The framework incorporates comprehensive preprocessing and data augmentation of brain magnetic resonance images (MRI)<n>The novelty lies in the dual-level ensembling strategy: feature-level ensembling, and classifier-level ensembling.<n> Experiments on two public Kaggle MRI brain tumor datasets demonstrate that this approach significantly surpasses state-of-the-art methods.
arXiv Detail & Related papers (2025-06-14T05:53:54Z)
CodeBrain: Towards Decoupled Interpretability and Multi-Scale Architecture for EEG Foundation Model [52.466542039411515]
EEG foundation models (EFMs) have emerged to address the scalability issues of task-specific models.<n>We present CodeBrain, a two-stage EFM designed to fill this gap.<n>In the first stage, we introduce the TFDual-Tokenizer, which decouples heterogeneous temporal and frequency EEG signals into discrete tokens.<n>In the second stage, we propose the multi-scale EEGSSM architecture, which combines structured global convolution with sliding window attention.
arXiv Detail & Related papers (2025-06-10T17:20:39Z)
CEReBrO: Compact Encoder for Representations of Brain Oscillations Using Efficient Alternating Attention [46.47343031985037]
We introduce a Compact for Representations of Brain Oscillations using alternating attention (CEReBrO)<n>Our tokenization scheme represents EEG signals at a per-channel patch.<n>We propose an alternating attention mechanism that jointly models intra-channel temporal dynamics and inter-channel spatial correlations, achieving 2x speed improvement with 6x less memory required compared to standard self-attention.
arXiv Detail & Related papers (2025-01-18T21:44:38Z)
Unveiling Incomplete Modality Brain Tumor Segmentation: Leveraging Masked Predicted Auto-Encoder and Divergence Learning [6.44069573245889]
Brain tumor segmentation remains a significant challenge, particularly in the context of multi-modal magnetic resonance imaging (MRI) We propose a novel strategy, which is called masked predicted pre-training, enabling robust feature learning from incomplete modality data. In the fine-tuning phase, we utilize a knowledge distillation technique to align features between complete and missing modality data, simultaneously enhancing model robustness.
arXiv Detail & Related papers (2024-06-12T20:35:16Z)
Cross-modality Guidance-aided Multi-modal Learning with Dual Attention for MRI Brain Tumor Grading [47.50733518140625]
Brain tumor represents one of the most fatal cancers around the world, and is very common in children and the elderly. We propose a novel cross-modality guidance-aided multi-modal learning with dual attention for addressing the task of MRI brain tumor grading.
arXiv Detail & Related papers (2024-01-17T07:54:49Z)
Cross-Modality Deep Feature Learning for Brain Tumor Segmentation [158.8192041981564]
This paper proposes a novel cross-modality deep feature learning framework to segment brain tumors from the multi-modality MRI data. The core idea is to mine rich patterns across the multi-modality data to make up for the insufficient data scale. Comprehensive experiments are conducted on the BraTS benchmarks, which show that the proposed cross-modality deep feature learning framework can effectively improve the brain tumor segmentation performance.
arXiv Detail & Related papers (2022-01-07T07:46:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.