Related papers: A Penny for Your Thoughts: Decoding Speech from Inexpensive Brain Signals

A Penny for Your Thoughts: Decoding Speech from Inexpensive Brain Signals

URL: http://arxiv.org/abs/2511.04691v1
Date: Tue, 28 Oct 2025 06:02:41 GMT
Title: A Penny for Your Thoughts: Decoding Speech from Inexpensive Brain Signals
Authors: Quentin Auster, Kateryna Shapovalenko, Chuang Ma, Demaio Sun,
Abstract summary: We explore whether neural networks can decode brain activity into speech by mapping EEG recordings to audio representations.<n>Using EEG data recorded as subjects listened to natural speech, we train a model with a contrastive CLIP loss to align EEG-derived embeddings with embeddings from a pre-trained transformer-based speech model.
Score: 1.621606615628714
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We explore whether neural networks can decode brain activity into speech by mapping EEG recordings to audio representations. Using EEG data recorded as subjects listened to natural speech, we train a model with a contrastive CLIP loss to align EEG-derived embeddings with embeddings from a pre-trained transformer-based speech model. Building on the state-of-the-art EEG decoder from Meta, we introduce three architectural modifications: (i) subject-specific attention layers (+0.15% WER improvement), (ii) personalized spatial attention (+0.45%), and (iii) a dual-path RNN with attention (-1.87%). Two of the three modifications improved performance, highlighting the promise of personalized architectures for brain-to-speech decoding and applications in brain-computer interfaces.

Related papers

Neural Decoding of Overt Speech from ECoG Using Vision Transformers and Contrastive Representation Learning [1.58476321728042]
Speech Brain Computer Interfaces offer promising solutions to people with severe paralysis unable to communicate.<n>Recent studies have demonstrated convincing reconstruction of intelligible speech from surface electrocorticographic (ECoG) or intracortical recordings.<n>We present an offline speech decoding pipeline based on an encoder-decoder deep neural architecture, integrating Vision Transformers and contrastive learning.
arXiv Detail & Related papers (2025-12-04T09:47:15Z)
NeuroCLIP: Brain-Inspired Prompt Tuning for EEG-to-Image Multimodal Contrastive Learning [13.254096454986318]
We present NeuroCLIP, a prompt tuning framework tailored for EEG-to-image contrastive learning.<n>We are the first to introduce visual prompt tokens into EEG-image alignment, acting as global, modality-level prompts.<n>On the THINGS-EEG2 dataset, NeuroCLIP achieves a Top-1 accuracy of 63.2% in zero-shot image retrieval.
arXiv Detail & Related papers (2025-11-12T12:13:24Z)
Decoding Covert Speech from EEG Using a Functional Areas Spatio-Temporal Transformer [9.914613096064848]
Decoding speech from electroencephalogram (EEG) is challenging due to a limited understanding of neural pronunciation mapping.<n>In this study, we developed a large-scale multi-utterance speech EEG from 57 right-handed native English-speaking subjects.<n>Our results reveal distinct speech neural features by the visualization of FAST-generated activation maps across frontal and temporal brain regions.
arXiv Detail & Related papers (2025-04-02T10:38:08Z)
Enhancing Listened Speech Decoding from EEG via Parallel Phoneme Sequence Prediction [36.38186261968484]
We propose a novel approach to enhance listened speech decoding from electroencephalography (EEG) signals.<n>We use an auxiliary phoneme predictor that simultaneously decodes textual phoneme sequences.
arXiv Detail & Related papers (2025-01-08T21:11:35Z)
BrainECHO: Semantic Brain Signal Decoding through Vector-Quantized Spectrogram Reconstruction for Whisper-Enhanced Text Generation [48.20672677492805]
Current EEG/MEG-to-text decoding systems suffer from three key limitations.<n>BrainECHO is a multi-stage framework that employs decoupled representation learning.<n>BrainECHO demonstrates robustness across sentence, session, and subject-independent conditions.
arXiv Detail & Related papers (2024-10-19T04:29:03Z)
CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction [61.067153685104394]
Dysarthric speech reconstruction (DSR) aims to transform dysarthric speech into normal speech. It still suffers from low speaker similarity and poor prosody naturalness. We propose a multi-modal DSR model by leveraging neural language modeling to improve the reconstruction results.
arXiv Detail & Related papers (2024-06-12T15:42:21Z)
Decoding speech perception from non-invasive brain recordings [48.46819575538446]
We introduce a model trained with contrastive-learning to decode self-supervised representations of perceived speech from non-invasive recordings. Our model can identify, from 3 seconds of MEG signals, the corresponding speech segment with up to 41% accuracy out of more than 1,000 distinct possibilities.
arXiv Detail & Related papers (2022-08-25T10:01:43Z)
Open Vocabulary Electroencephalography-To-Text Decoding and Zero-shot Sentiment Classification [78.120927891455]
State-of-the-art brain-to-text systems have achieved great success in decoding language directly from brain signals using neural networks. In this paper, we extend the problem to open vocabulary Electroencephalography(EEG)-To-Text Sequence-To-Sequence decoding and zero-shot sentence sentiment classification on natural reading tasks. Our model achieves a 40.1% BLEU-1 score on EEG-To-Text decoding and a 55.6% F1 score on zero-shot EEG-based ternary sentiment classification, which significantly outperforms supervised baselines.
arXiv Detail & Related papers (2021-12-05T21:57:22Z)
Speech2AffectiveGestures: Synthesizing Co-Speech Gestures with Generative Adversarial Affective Expression Learning [52.73083137245969]
We present a generative adversarial network to synthesize 3D pose sequences of co-speech upper-body gestures with appropriate affective expressions. Our network consists of two components: a generator to synthesize gestures from a joint embedding space of features encoded from the input speech and the seed poses, and a discriminator to distinguish between the synthesized pose sequences and real 3D pose sequences.
arXiv Detail & Related papers (2021-07-31T15:13:39Z)
Emotional EEG Classification using Connectivity Features and Convolutional Neural Networks [81.74442855155843]
We introduce a new classification system that utilizes brain connectivity with a CNN and validate its effectiveness via the emotional video classification. The level of concentration of the brain connectivity related to the emotional property of the target video is correlated with classification performance.
arXiv Detail & Related papers (2021-01-18T13:28:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.