Related papers: Inner speech recognition through electroencephalographic signals

Inner speech recognition through electroencephalographic signals

URL: http://arxiv.org/abs/2210.06472v1
Date: Tue, 11 Oct 2022 08:29:12 GMT
Title: Inner speech recognition through electroencephalographic signals
Authors: Francesca Gasparini, Elisa Cazzaniga, Aurora Saibene
Abstract summary: This work focuses on inner speech recognition starting from EEG signals. The decoding of the EEG into text should be understood as the classification of a limited number of words (commands) Speech-related BCIs provide effective vocal communication strategies for controlling devices through speech commands interpreted from brain signals.
Score: 2.578242050187029
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This work focuses on inner speech recognition starting from EEG signals. Inner speech recognition is defined as the internalized process in which the person thinks in pure meanings, generally associated with an auditory imagery of own inner "voice". The decoding of the EEG into text should be understood as the classification of a limited number of words (commands) or the presence of phonemes (units of sound that make up words). Speech-related BCIs provide effective vocal communication strategies for controlling devices through speech commands interpreted from brain signals, improving the quality of life of people who have lost the capability to speak, by restoring communication with their environment. Two public inner speech datasets are analysed. Using this data, some classification models are studied and implemented starting from basic methods such as Support Vector Machines, to ensemble methods such as the eXtreme Gradient Boosting classifier up to the use of neural networks such as Long Short Term Memory (LSTM) and Bidirectional Long Short Term Memory (BiLSTM). With the LSTM and BiLSTM models, generally not used in the literature of inner speech recognition, results in line with or superior to those present in the stateof-the-art are obtained.

Related papers

Transfer Learning for Covert Speech Classification Using EEG Hilbert Envelope and Temporal Fine Structure [6.468510459310326]
Brain-Computer Interfaces (BCIs) can decode imagined speech from neural activity. BCIs typically require extensive training sessions where participants imaginedly repeat words. This paper addresses these challenges by transferring a classifier trained in overt speech data to covert speech classification.
arXiv Detail & Related papers (2025-02-06T15:09:01Z)
Imagined Speech and Visual Imagery as Intuitive Paradigms for Brain-Computer Interfaces [1.33134751838052]
Brain-computer interfaces (BCIs) have shown promise in enabling communication for individuals with motor impairments.<n>Recent advancements like brain-to-speech technology aim to reconstruct speech from neural activity.<n> decoding communication-related paradigms, such as imagined speech and visual imagery, using non-invasive techniques remains challenging.
arXiv Detail & Related papers (2024-11-14T12:19:28Z)
Towards Unified Neural Decoding of Perceived, Spoken and Imagined Speech from EEG Signals [1.33134751838052]
This research investigated the effectiveness of deep learning models for non-invasive neural signal decoding. It focused on distinguishing between different speech paradigms, including perceived, overt, whispered, and imagined speech.
arXiv Detail & Related papers (2024-11-14T07:20:08Z)
BrainECHO: Semantic Brain Signal Decoding through Vector-Quantized Spectrogram Reconstruction for Whisper-Enhanced Text Generation [29.78480739360263]
We propose a new multi-stage strategy for semantic brain signal decoding via vEctor-quantized speCtrogram reconstruction. BrainECHO successively conducts: 1) autoencoding of the audio spectrogram; 2) Brain-audio latent space alignment; and 3) Semantic text generation via Whisper finetuning. BrainECHO outperforms state-of-the-art methods under the same data split settings on two widely accepted resources.
arXiv Detail & Related papers (2024-10-19T04:29:03Z)
Learning Speech Representation From Contrastive Token-Acoustic Pretraining [57.08426714676043]
We propose "Contrastive Token-Acoustic Pretraining (CTAP)", which uses two encoders to bring phoneme and speech into a joint multimodal space. The proposed CTAP model is trained on 210k speech and phoneme pairs, achieving minimally-supervised TTS, VC, and ASR.
arXiv Detail & Related papers (2023-09-01T12:35:43Z)
Introducing Semantics into Speech Encoders [91.37001512418111]
We propose an unsupervised way of incorporating semantic information from large language models into self-supervised speech encoders without labeled audio transcriptions. Our approach achieves similar performance as supervised methods trained on over 100 hours of labeled audio transcripts.
arXiv Detail & Related papers (2022-11-15T18:44:28Z)
Direction-Aware Joint Adaptation of Neural Speech Enhancement and Recognition in Real Multiparty Conversational Environments [21.493664174262737]
This paper describes noisy speech recognition for an augmented reality headset that helps verbal communication within real multiparty conversational environments. We propose a semi-supervised adaptation method that jointly updates the mask estimator and the ASR model at run-time using clean speech signals with ground-truth transcriptions and noisy speech signals with highly-confident estimated transcriptions.
arXiv Detail & Related papers (2022-07-15T03:43:35Z)
Streaming Multi-talker Speech Recognition with Joint Speaker Identification [77.46617674133556]
SURIT employs the recurrent neural network transducer (RNN-T) as the backbone for both speech recognition and speaker identification. We validate our idea on the Librispeech dataset -- a multi-talker dataset derived from Librispeech, and present encouraging results.
arXiv Detail & Related papers (2021-04-05T18:37:33Z)
The "Sound of Silence" in EEG -- Cognitive voice activity detection [22.196642357767338]
"Non-speech"(NS) state of brain activity corresponding to silence regions of speech audio is studied. Speech perception is studied to inspect the existence of such a state, followed by its identification in speech imagination. The recognition performance and the visual distinction observed demonstrates the existence of silence signatures in EEG.
arXiv Detail & Related papers (2020-10-12T07:47:36Z)
An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation [57.68765353264689]
Speech enhancement and speech separation are two related tasks. Traditionally, these tasks have been tackled using signal processing and machine learning techniques. Deep learning has been exploited to achieve strong performance.
arXiv Detail & Related papers (2020-08-21T17:24:09Z)
Understanding effect of speech perception in EEG based speech recognition systems [3.5786621294068377]
The electroencephalography (EEG) signals recorded in parallel with speech are used to perform isolated and continuous speech recognition. We investigate whether it is possible to separate out this speech perception component from EEG signals in order to design more robust EEG based speech recognition systems.
arXiv Detail & Related papers (2020-05-29T05:56:09Z)
Speech Enhancement using Self-Adaptation and Multi-Head Self-Attention [70.82604384963679]
This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features. We extract a speaker representation used for adaptation directly from the test utterance.
arXiv Detail & Related papers (2020-02-14T05:05:36Z)
Continuous speech separation: dataset and analysis [52.10378896407332]
In natural conversations, a speech signal is continuous, containing both overlapped and overlap-free components. This paper describes a dataset and protocols for evaluating continuous speech separation algorithms.
arXiv Detail & Related papers (2020-01-30T18:01:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.