NeuSpeech: Decode Neural signal as Speech
- URL: http://arxiv.org/abs/2403.01748v3
- Date: Mon, 3 Jun 2024 16:58:04 GMT
- Title: NeuSpeech: Decode Neural signal as Speech
- Authors: Yiqian Yang, Yiqun Duan, Qiang Zhang, Hyejeong Jo, Jinni Zhou, Won Hee Lee, Renjing Xu, Hui Xiong,
- Abstract summary: We explore the brain-to-text translation of MEG signals in a speech-decoding formation.
Our model achieves impressive BLEU-1 scores of 60.30 and 52.89 without pretraining $&$ teacher-forcing.
- Score: 21.155031900491654
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Decoding language from brain dynamics is an important open direction in the realm of brain-computer interface (BCI), especially considering the rapid growth of large language models. Compared to invasive-based signals which require electrode implantation surgery, non-invasive neural signals (e.g. EEG, MEG) have attracted increasing attention considering their safety and generality. However, the exploration is not adequate in three aspects: 1) previous methods mainly focus on EEG but none of the previous works address this problem on MEG with better signal quality; 2) prior works have predominantly used $``teacher-forcing"$ during generative decoding, which is impractical; 3) prior works are mostly $``BART-based"$ not fully auto-regressive, which performs better in other sequence tasks. In this paper, we explore the brain-to-text translation of MEG signals in a speech-decoding formation. Here we are the first to investigate a cross-attention-based ``whisper" model for generating text directly from MEG signals without teacher forcing. Our model achieves impressive BLEU-1 scores of 60.30 and 52.89 without pretraining $\&$ teacher-forcing on two major datasets ($\textit{GWilliams}$ and $\textit{Schoffelen}$). This paper conducts a comprehensive review to understand how speech decoding formation performs on the neural decoding tasks, including pretraining initialization, training $\&$ evaluation set splitting, augmentation, and scaling law. Code is available at https://github.com/NeuSpeech/NeuSpeech1$.
Related papers
- NeuGPT: Unified multi-modal Neural GPT [48.70587003475798]
NeuGPT is a groundbreaking multi-modal language generation model designed to harmonize the fragmented landscape of neural recording research.
Our model mainly focus on brain-to-text decoding, improving SOTA from 6.94 to 12.92 on BLEU-1 and 6.93 to 13.06 on ROUGE-1F.
It can also simulate brain signals, thereby serving as a novel neural interface.
arXiv Detail & Related papers (2024-10-28T10:53:22Z) - BrainECHO: Semantic Brain Signal Decoding through Vector-Quantized Spectrogram Reconstruction for Whisper-Enhanced Text Generation [29.78480739360263]
We propose a new multi-stage strategy for semantic brain signal decoding via vEctor-quantized speCtrogram reconstruction.
BrainECHO successively conducts: 1) autoencoding of the audio spectrogram; 2) Brain-audio latent space alignment; and 3) Semantic text generation via Whisper finetuning.
BrainECHO outperforms state-of-the-art methods under the same data split settings on two widely accepted resources.
arXiv Detail & Related papers (2024-10-19T04:29:03Z) - MAD: Multi-Alignment MEG-to-Text Decoding [21.155031900491654]
We present a novel approach for translating MEG signals into text using a speech-decoding framework with multiple alignments.
We achieve an impressive BLEU-1 score on the $textitGWilliams$ dataset, significantly outperforming the baseline from 5.49 to 10.44 on the BLEU-1 metric.
arXiv Detail & Related papers (2024-06-03T16:43:10Z) - Language Reconstruction with Brain Predictive Coding from fMRI Data [28.217967547268216]
Theory of predictive coding suggests that human brain naturally engages in continuously predicting future word representations.
textscPredFT achieves current state-of-the-art decoding performance with a maximum BLEU-1 score of $27.8%$.
arXiv Detail & Related papers (2024-05-19T16:06:02Z) - Open-vocabulary Auditory Neural Decoding Using fMRI-prompted LLM [19.53589633360839]
We introduce a novel method, the textbfBrain Prompt GPT (BP-GPT).
By using the brain representation that is extracted from the fMRI as a prompt, our method can utilize GPT-2 to decode fMRI signals stimulus into text.
We evaluate our BP-GPT on the open-source auditory semantic decoding dataset and achieve a significant improvement up to $4.61%$ on METEOR and $2.43%$ on BERTScore.
arXiv Detail & Related papers (2024-05-13T15:25:11Z) - SpeechUT: Bridging Speech and Text with Hidden-Unit for Encoder-Decoder
Based Speech-Text Pre-training [106.34112664893622]
We propose a unified-modal speech-unit-text pre-training model, SpeechUT, to connect the representations of a speech encoder and a text decoder with a shared unit encoder.
Our proposed SpeechUT is fine-tuned and evaluated on automatic speech recognition (ASR) and speech translation (ST) tasks.
arXiv Detail & Related papers (2022-10-07T17:57:45Z) - Toward a realistic model of speech processing in the brain with
self-supervised learning [67.7130239674153]
Self-supervised algorithms trained on the raw waveform constitute a promising candidate.
We show that Wav2Vec 2.0 learns brain-like representations with as little as 600 hours of unlabelled speech.
arXiv Detail & Related papers (2022-06-03T17:01:46Z) - SVTS: Scalable Video-to-Speech Synthesis [105.29009019733803]
We introduce a scalable video-to-speech framework consisting of two components: a video-to-spectrogram predictor and a pre-trained neural vocoder.
We are the first to show intelligible results on the challenging LRS3 dataset.
arXiv Detail & Related papers (2022-05-04T13:34:07Z) - Open Vocabulary Electroencephalography-To-Text Decoding and Zero-shot
Sentiment Classification [78.120927891455]
State-of-the-art brain-to-text systems have achieved great success in decoding language directly from brain signals using neural networks.
In this paper, we extend the problem to open vocabulary Electroencephalography(EEG)-To-Text Sequence-To-Sequence decoding and zero-shot sentence sentiment classification on natural reading tasks.
Our model achieves a 40.1% BLEU-1 score on EEG-To-Text decoding and a 55.6% F1 score on zero-shot EEG-based ternary sentiment classification, which significantly outperforms supervised baselines.
arXiv Detail & Related papers (2021-12-05T21:57:22Z) - Emergent Communication Pretraining for Few-Shot Machine Translation [66.48990742411033]
We pretrain neural networks via emergent communication from referential games.
Our key assumption is that grounding communication on images---as a crude approximation of real-world environments---inductively biases the model towards learning natural languages.
arXiv Detail & Related papers (2020-11-02T10:57:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.