Related papers: Thought2Text: Text Generation from EEG Signal using Large Language Models (LLMs)

Thought2Text: Text Generation from EEG Signal using Large Language Models (LLMs)

URL: http://arxiv.org/abs/2410.07507v1
Date: Thu, 10 Oct 2024 00:47:59 GMT
Title: Thought2Text: Text Generation from EEG Signal using Large Language Models (LLMs)
Authors: Abhijit Mishra, Shreya Shukla, Jose Torres, Jacek Gwizdka, Shounak Roychowdhury,
Abstract summary: This paper presents Thought2Text, which uses instruction-tuned Large Language Models (LLMs) fine-tuned with EEG data to achieve this goal. Experiments on a public EEG dataset collected for six subjects with image stimuli demonstrate the efficacy of multimodal LLMs. This approach marks a significant advancement towards portable, low-cost "thoughts-to-text" technology with potential applications in both neuroscience and natural language processing (NLP)
Score: 4.720913027054481
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Decoding and expressing brain activity in a comprehensible form is a challenging frontier in AI. This paper presents Thought2Text, which uses instruction-tuned Large Language Models (LLMs) fine-tuned with EEG data to achieve this goal. The approach involves three stages: (1) training an EEG encoder for visual feature extraction, (2) fine-tuning LLMs on image and text data, enabling multimodal description generation, and (3) further fine-tuning on EEG embeddings to generate text directly from EEG during inference. Experiments on a public EEG dataset collected for six subjects with image stimuli demonstrate the efficacy of multimodal LLMs (LLaMa-v3, Mistral-v0.3, Qwen2.5), validated using traditional language generation evaluation metrics, GPT-4 based assessments, and evaluations by human expert. This approach marks a significant advancement towards portable, low-cost "thoughts-to-text" technology with potential applications in both neuroscience and natural language processing (NLP).

Related papers

Learning Interpretable Representations Leads to Semantically Faithful EEG-to-Text Generation [52.51005875755718]
We focus on EEG-to-text decoding and address its hallucination issue through the lens of posterior collapse.<n>Acknowledging the underlying mismatch in information capacity between EEG and text, we reframe the decoding task as semantic summarization of core meanings.<n>Experiments on the public ZuCo dataset demonstrate that GLIM consistently generates fluent, EEG-grounded sentences.
arXiv Detail & Related papers (2025-05-21T05:29:55Z)
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO [87.52631406241456]
Recent text-to-image systems face limitations in handling multimodal inputs and complex reasoning tasks.<n>We introduce Mind Omni, a unified multimodal large language model that addresses these challenges by incorporating reasoning generation through reinforcement learning.
arXiv Detail & Related papers (2025-05-19T12:17:04Z)
EMMA: Efficient Visual Alignment in Multi-Modal LLMs [56.03417732498859]
EMMA is a lightweight cross-modality module designed to efficiently fuse visual and textual encodings. EMMA boosts performance across multiple tasks by up to 9.3% while significantly improving robustness against hallucinations.
arXiv Detail & Related papers (2024-10-02T23:00:31Z)
Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition [110.8431434620642]
We introduce the generative speech transcription error correction (GenSEC) challenge. This challenge comprises three post-ASR language modeling tasks: (i) post-ASR transcription correction, (ii) speaker tagging, and (iii) emotion recognition. We discuss insights from baseline evaluations, as well as lessons learned for designing future evaluations.
arXiv Detail & Related papers (2024-09-15T16:32:49Z)
SEE: Semantically Aligned EEG-to-Text Translation [5.460650382586978]
Decoding neurophysiological signals into language is of great research interest within brain-computer interface (BCI) applications. Current EEG-to-Text decoding approaches face challenges due to the huge domain gap between EEG recordings and raw texts. We propose SEE: Semantically Aligned EEG-to-Text Translation, a novel method aimed at improving EEG-to-Text decoding.
arXiv Detail & Related papers (2024-09-14T05:37:15Z)
BELT-2: Bootstrapping EEG-to-Language representation alignment for multi-task brain decoding [24.54436986074267]
We introduce BELT-2, a pioneering multi-task model designed to enhance both encoding and decoding performance from EEG signals. BELT-2 is the first work to innovatively 1) adopt byte-pair encoding (BPE)-level EEG-language alignment and 2) integrate multi-task training and decoding in the EEG domain. These innovative efforts make BELT-2 a pioneering breakthrough, making it the first work in the field capable of decoding coherent and readable sentences from non-invasive brain signals.
arXiv Detail & Related papers (2024-08-28T12:30:22Z)
Towards Linguistic Neural Representation Learning and Sentence Retrieval from Electroencephalogram Recordings [27.418738450536047]
We propose a two-step pipeline for converting EEG signals into sentences. We first confirm that word-level semantic information can be learned from EEG data recorded during natural reading. We employ a training-free retrieval method to retrieve sentences based on the predictions from the EEG encoder.
arXiv Detail & Related papers (2024-08-08T03:40:25Z)
MAD: Multi-Alignment MEG-to-Text Decoding [21.155031900491654]
We present a novel approach for translating MEG signals into text using a speech-decoding framework with multiple alignments. We achieve an impressive BLEU-1 score on the $textitGWilliams$ dataset, significantly outperforming the baseline from 5.49 to 10.44 on the BLEU-1 metric.
arXiv Detail & Related papers (2024-06-03T16:43:10Z)
Enhancing EEG-to-Text Decoding through Transferable Representations from Pre-trained Contrastive EEG-Text Masked Autoencoder [69.7813498468116]
We propose Contrastive EEG-Text Masked Autoencoder (CET-MAE), a novel model that orchestrates compound self-supervised learning across and within EEG and text. We also develop a framework called E2T-PTR (EEG-to-Text decoding using Pretrained Transferable Representations) to decode text from EEG sequences.
arXiv Detail & Related papers (2024-02-27T11:45:21Z)
Neural Sign Actors: A diffusion model for 3D sign language production from text [51.81647203840081]
Sign Languages (SL) serve as the primary mode of communication for the Deaf and Hard of Hearing communities. This work makes an important step towards realistic neural sign avatars, bridging the communication gap between Deaf and hearing communities.
arXiv Detail & Related papers (2023-12-05T12:04:34Z)
Deep Representation Learning for Open Vocabulary Electroencephalography-to-Text Decoding [6.014363449216054]
We present an end-to-end deep learning framework for non-invasive brain recordings that brings modern representational learning approaches to neuroscience. Our model achieves a BLEU-1 score of 42.75%, a ROUGE-1-F of 33.28%, and a BERTScore-F of 53.86%, outperforming the previous state-of-the-art methods by 3.38%, 8.43%, and 6.31%, respectively.
arXiv Detail & Related papers (2023-11-15T08:03:09Z)
Document-Level Machine Translation with Large Language Models [91.03359121149595]
Large language models (LLMs) can produce coherent, cohesive, relevant, and fluent answers for various natural language processing (NLP) tasks. This paper provides an in-depth evaluation of LLMs' ability on discourse modeling.
arXiv Detail & Related papers (2023-04-05T03:49:06Z)
Vision-Language Pre-Training for Boosting Scene Text Detectors [57.08046351495244]
We specifically adapt vision-language joint learning for scene text detection. We propose to learn contextualized, joint representations through vision-language pre-training. The pre-trained model is able to produce more informative representations with richer semantics.
arXiv Detail & Related papers (2022-04-29T03:53:54Z)
Attend, Memorize and Generate: Towards Faithful Table-to-Text Generation in Few Shots [58.404516361586325]
Few-shot table-to-text generation is a task of composing fluent and faithful sentences to convey table content using limited data. This paper proposes a novel approach, Memorize and Generate (called AMG), inspired by the text generation process of humans.
arXiv Detail & Related papers (2022-03-01T20:37:20Z)
Open Vocabulary Electroencephalography-To-Text Decoding and Zero-shot Sentiment Classification [78.120927891455]
State-of-the-art brain-to-text systems have achieved great success in decoding language directly from brain signals using neural networks. In this paper, we extend the problem to open vocabulary Electroencephalography(EEG)-To-Text Sequence-To-Sequence decoding and zero-shot sentence sentiment classification on natural reading tasks. Our model achieves a 40.1% BLEU-1 score on EEG-To-Text decoding and a 55.6% F1 score on zero-shot EEG-based ternary sentiment classification, which significantly outperforms supervised baselines.
arXiv Detail & Related papers (2021-12-05T21:57:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.