MindSpeech: Continuous Imagined Speech Decoding using High-Density fNIRS and Prompt Tuning for Advanced Human-AI Interaction
- URL: http://arxiv.org/abs/2408.05362v1
- Date: Thu, 25 Jul 2024 16:39:21 GMT
- Title: MindSpeech: Continuous Imagined Speech Decoding using High-Density fNIRS and Prompt Tuning for Advanced Human-AI Interaction
- Authors: Suyi Zhang, Ekram Alam, Jack Baber, Francesca Bianco, Edward Turner, Maysam Chamanzar, Hamid Dehghani,
- Abstract summary: This paper reports a novel method for human-AI interaction by developing a direct brain-AI interface.
We discuss a novel AI model, called MindSpeech, which enables open-vocabulary, continuous decoding for imagined speech.
We demonstrate significant improvements in key metrics, such as BLEU-1 and BERT P scores, for three out of four participants.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In the coming decade, artificial intelligence systems will continue to improve and revolutionise every industry and facet of human life. Designing effective, seamless and symbiotic communication paradigms between humans and AI agents is increasingly important. This paper reports a novel method for human-AI interaction by developing a direct brain-AI interface. We discuss a novel AI model, called MindSpeech, which enables open-vocabulary, continuous decoding for imagined speech. This study focuses on enhancing human-AI communication by utilising high-density functional near-infrared spectroscopy (fNIRS) data to develop an AI model capable of decoding imagined speech non-invasively. We discuss a new word cloud paradigm for data collection, improving the quality and variety of imagined sentences generated by participants and covering a broad semantic space. Utilising a prompt tuning-based approach, we employed the Llama2 large language model (LLM) for text generation guided by brain signals. Our results show significant improvements in key metrics, such as BLEU-1 and BERT P scores, for three out of four participants, demonstrating the method's effectiveness. Additionally, we demonstrate that combining data from multiple participants enhances the decoder performance, with statistically significant improvements in BERT scores for two participants. Furthermore, we demonstrated significantly above-chance decoding accuracy for imagined speech versus resting conditions and the identified activated brain regions during imagined speech tasks in our study are consistent with the previous studies on brain regions involved in speech encoding. This study underscores the feasibility of continuous imagined speech decoding. By integrating high-density fNIRS with advanced AI techniques, we highlight the potential for non-invasive, accurate communication systems with AI in the near future.
Related papers
- Towards Unified Neural Decoding of Perceived, Spoken and Imagined Speech from EEG Signals [1.33134751838052]
This research investigated the effectiveness of deep learning models for non-invasive neural signal decoding.
It focused on distinguishing between different speech paradigms, including perceived, overt, whispered, and imagined speech.
arXiv Detail & Related papers (2024-11-14T07:20:08Z) - BrainECHO: Semantic Brain Signal Decoding through Vector-Quantized Spectrogram Reconstruction for Whisper-Enhanced Text Generation [29.78480739360263]
We propose a new multi-stage strategy for semantic brain signal decoding via vEctor-quantized speCtrogram reconstruction.
BrainECHO successively conducts: 1) autoencoding of the audio spectrogram; 2) Brain-audio latent space alignment; and 3) Semantic text generation via Whisper finetuning.
BrainECHO outperforms state-of-the-art methods under the same data split settings on two widely accepted resources.
arXiv Detail & Related papers (2024-10-19T04:29:03Z) - Multimodal Fusion with LLMs for Engagement Prediction in Natural Conversation [70.52558242336988]
We focus on predicting engagement in dyadic interactions by scrutinizing verbal and non-verbal cues, aiming to detect signs of disinterest or confusion.
In this work, we collect a dataset featuring 34 participants engaged in casual dyadic conversations, each providing self-reported engagement ratings at the end of each conversation.
We introduce a novel fusion strategy using Large Language Models (LLMs) to integrate multiple behavior modalities into a multimodal transcript''
arXiv Detail & Related papers (2024-09-13T18:28:12Z) - MindGPT: Advancing Human-AI Interaction with Non-Invasive fNIRS-Based Imagined Speech Decoding [0.0]
Building communication systems that enable seamless and symbiotic communication between humans and AI agents is increasingly important.
This research advances the field of human-AI interaction by developing an innovative approach to decode imagined speech using non-invasive high-density functional near-infrared spectroscopy (fNIRS)
Notably, this study introduces MindGPT, the first thought-to-LLM (large language model) system in the world.
arXiv Detail & Related papers (2024-07-25T18:18:52Z) - Converging Paradigms: The Synergy of Symbolic and Connectionist AI in LLM-Empowered Autonomous Agents [55.63497537202751]
Article explores the convergence of connectionist and symbolic artificial intelligence (AI)
Traditionally, connectionist AI focuses on neural networks, while symbolic AI emphasizes symbolic representation and logic.
Recent advancements in large language models (LLMs) highlight the potential of connectionist architectures in handling human language as a form of symbols.
arXiv Detail & Related papers (2024-07-11T14:00:53Z) - Sequential Best-Arm Identification with Application to Brain-Computer
Interface [34.87975833920409]
A brain-computer interface (BCI) is a technology that enables direct communication between the brain and an external device or computer system.
An electroencephalogram (EEG) and event-related potential (ERP)-based speller system is a type of BCI that allows users to spell words without using a physical keyboard.
We propose a sequential top-two Thompson sampling (STTS) algorithm under the fixed-confidence setting and the fixed-budget setting.
arXiv Detail & Related papers (2023-05-17T18:49:44Z) - A Vector Quantized Approach for Text to Speech Synthesis on Real-World
Spontaneous Speech [94.64927912924087]
We train TTS systems using real-world speech from YouTube and podcasts.
Recent Text-to-Speech architecture is designed for multiple code generation and monotonic alignment.
We show thatRecent Text-to-Speech architecture outperforms existing TTS systems in several objective and subjective measures.
arXiv Detail & Related papers (2023-02-08T17:34:32Z) - Decoding speech perception from non-invasive brain recordings [48.46819575538446]
We introduce a model trained with contrastive-learning to decode self-supervised representations of perceived speech from non-invasive recordings.
Our model can identify, from 3 seconds of MEG signals, the corresponding speech segment with up to 41% accuracy out of more than 1,000 distinct possibilities.
arXiv Detail & Related papers (2022-08-25T10:01:43Z) - Toward a realistic model of speech processing in the brain with
self-supervised learning [67.7130239674153]
Self-supervised algorithms trained on the raw waveform constitute a promising candidate.
We show that Wav2Vec 2.0 learns brain-like representations with as little as 600 hours of unlabelled speech.
arXiv Detail & Related papers (2022-06-03T17:01:46Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - Building Human-like Communicative Intelligence: A Grounded Perspective [1.0152838128195465]
After making astounding progress in language learning, AI systems seem to approach the ceiling that does not reflect important aspects of human communicative capacities.
This paper suggests that the dominant cognitively-inspired AI directions, based on nativist and symbolic paradigms, lack necessary substantiation and concreteness to guide progress in modern AI.
I propose a list of concrete, implementable components for building "grounded" linguistic intelligence.
arXiv Detail & Related papers (2022-01-02T01:43:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.