Lightweight Diffusion-based Framework for Online Imagined Speech Decoding in Aphasia
- URL: http://arxiv.org/abs/2511.07920v1
- Date: Wed, 12 Nov 2025 01:28:37 GMT
- Title: Lightweight Diffusion-based Framework for Online Imagined Speech Decoding in Aphasia
- Authors: Eunyeong Ko, Soowon Kim, Ha-Na Jo,
- Abstract summary: A diffusion-based neural decoding framework is optimized for real-time imagined speech classification in individuals with aphasia.<n>A dual-criterion early stopping strategy enabled rapid convergence under limited calibration data.<n>The proposed framework advances the translation of imagined speech brain-computer interfaces toward clinical communication support.
- Score: 1.299941371793082
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A diffusion-based neural decoding framework optimized for real-time imagined speech classification in individuals with aphasia. The system integrates a lightweight conditional diffusion encoder and convolutional classifier trained using subject-specific EEG data acquired from a Korean-language paradigm. A dual-criterion early stopping strategy enabled rapid convergence under limited calibration data, while dropout regularization and grouped temporal convolutions ensured stable generalization. During online operation, continuous EEG streams were processed in two-second sliding windows to generate class probabilities that dynamically modulated visual and auditory feedback according to decoding confidence. Across twenty real-time trials, the framework achieved 65% top-1 and 70% top-2 accuracy, outperforming offline evaluation (50% top-1). These results demonstrate the feasibility of deploying diffusion-based EEG decoding under practical clinical constraints, maintaining reliable performance despite environmental variability and minimal preprocessing. The proposed framework advances the translation of imagined speech brain-computer interfaces toward clinical communication support for individuals with severe expressive language impairment.
Related papers
- JEPA as a Neural Tokenizer: Learning Robust Speech Representations with Density Adaptive Attention [47.304088800992474]
We introduce a two-stage self-supervised framework that combines the Joint-Embedding Predictive Architecture (JEPA) with a Density Adaptive Attention Mechanism (DAAM)<n>Stage1 uses JEPA with DAAM to learn semantic audio features via masked prediction in latent space, fully decoupled from waveform reconstruction.<n>Stage2 leverages these representations for efficient tokenization using Finite Scalar Quantization (FSQ) and a mixed-radix packing scheme, followed by high-fidelity waveform reconstruction with a HiFi-GAN decoder.
arXiv Detail & Related papers (2025-12-08T05:01:51Z) - The Locally Deployable Virtual Doctor: LLM Based Human Interface for Automated Anamnesis and Database Conversion [0.0]
MedChat is a locally deployable virtual physician framework for AI-assisted clinical anamnesis.<n>Unlike existing cloud-based systems, this work demonstrates the feasibility of a fully offline, locally deployable LLM-diffusion framework for clinical anamnesis.
arXiv Detail & Related papers (2025-11-23T22:12:35Z) - On the Difficulty of Token-Level Modeling of Dysfluency and Fluency Shaping Artifacts [21.253980895817634]
Dysfluencies and fluency-shaping artifacts are often overlooked, resulting in non-verbatim transcriptions with limited clinical and research value.<n>We propose a parameter-efficient adaptation method to decode dysfluencies and fluency modifications as special tokens within transcriptions.<n>Our findings demonstrate the effectiveness of lightweight adaptation techniques for dysfluency-aware ASR.
arXiv Detail & Related papers (2025-11-18T19:33:29Z) - Toward Robust EEG-based Intention Decoding during Misarticulated Speech in Aphasia [0.0]
Aphasia severely limits verbal communication due to impaired language production, often leading to frequent misarticulations during speech attempts.<n>Despite growing interest in brain-computer interface technologies, relatively little attention has been paid to developing EEG-based communication support systems tailored for aphasic patients.
arXiv Detail & Related papers (2025-11-11T06:49:44Z) - Temporal-Aware Iterative Speech Model for Dementia Detection [0.0]
Current methods for automated dementia detection using speech rely on static, time-agnostic features or aggregated linguistic content.<n>We introduce TAI-Speech, a Temporal Aware Iterative framework that dynamically models spontaneous speech for dementia detection.<n>Our work provides a more flexible and robust solution for automated cognitive assessment, operating directly on the dynamics of raw audio.
arXiv Detail & Related papers (2025-09-26T01:56:07Z) - Improved Intelligibility of Dysarthric Speech using Conditional Flow Matching [0.0]
Dysarthria is a neurological disorder that significantly impairs speech intelligibility.<n>This necessitates the development of robust dysarthric-to-regular speech conversion techniques.
arXiv Detail & Related papers (2025-06-19T08:24:17Z) - CodeBrain: Towards Decoupled Interpretability and Multi-Scale Architecture for EEG Foundation Model [52.466542039411515]
EEG foundation models (EFMs) have emerged to address the scalability issues of task-specific models.<n>We present CodeBrain, a two-stage EFM designed to fill this gap.<n>In the first stage, we introduce the TFDual-Tokenizer, which decouples heterogeneous temporal and frequency EEG signals into discrete tokens.<n>In the second stage, we propose the multi-scale EEGSSM architecture, which combines structured global convolution with sliding window attention.
arXiv Detail & Related papers (2025-06-10T17:20:39Z) - Dysfluent WFST: A Framework for Zero-Shot Speech Dysfluency Transcription and Detection [5.512072120303165]
Dysfluent-WFST is a zero-shot decoder that simultaneously transcribes phonemes and detects dysfluency.<n>It achieves state-of-the-art performance in both phonetic error rate and dysfluency detection on simulated and real speech data.
arXiv Detail & Related papers (2025-05-22T08:02:50Z) - BrainECHO: Semantic Brain Signal Decoding through Vector-Quantized Spectrogram Reconstruction for Whisper-Enhanced Text Generation [48.20672677492805]
Current EEG/MEG-to-text decoding systems suffer from three key limitations.<n>BrainECHO is a multi-stage framework that employs decoupled representation learning.<n>BrainECHO demonstrates robustness across sentence, session, and subject-independent conditions.
arXiv Detail & Related papers (2024-10-19T04:29:03Z) - Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation [71.31331402404662]
This paper proposes two novel data-efficient methods to learn dysarthric and elderly speaker-level features.
Speaker-regularized spectral basis embedding-SBE features that exploit a special regularization term to enforce homogeneity of speaker features in adaptation.
Feature-based learning hidden unit contributions (f-LHUC) that are conditioned on VR-LH features that are shown to be insensitive to speaker-level data quantity in testtime adaptation.
arXiv Detail & Related papers (2024-07-08T18:20:24Z) - UNIT-DSR: Dysarthric Speech Reconstruction System Using Speech Unit
Normalization [60.43992089087448]
Dysarthric speech reconstruction systems aim to automatically convert dysarthric speech into normal-sounding speech.
We propose a Unit-DSR system, which harnesses the powerful domain-adaptation capacity of HuBERT for training efficiency improvement.
Compared with NED approaches, the Unit-DSR system only consists of a speech unit normalizer and a Unit HiFi-GAN vocoder, which is considerably simpler without cascaded sub-modules or auxiliary tasks.
arXiv Detail & Related papers (2024-01-26T06:08:47Z) - Expedited Training of Visual Conditioned Language Generation via
Redundancy Reduction [61.16125290912494]
$textEVL_textGen$ is a framework designed for the pre-training of visually conditioned language generation models.
We show that our approach accelerates the training of vision-language models by a factor of 5 without a noticeable impact on overall performance.
arXiv Detail & Related papers (2023-10-05T03:40:06Z) - Scalable Learning of Latent Language Structure With Logical Offline
Cycle Consistency [71.42261918225773]
Conceptually, LOCCO can be viewed as a form of self-learning where the semantic being trained is used to generate annotations for unlabeled text.
As an added bonus, the annotations produced by LOCCO can be trivially repurposed to train a neural text generation model.
arXiv Detail & Related papers (2023-05-31T16:47:20Z) - Speaker Embedding-aware Neural Diarization: a Novel Framework for
Overlapped Speech Diarization in the Meeting Scenario [51.5031673695118]
We reformulate overlapped speech diarization as a single-label prediction problem.
We propose the speaker embedding-aware neural diarization (SEND) system.
arXiv Detail & Related papers (2022-03-18T06:40:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.