Inner speech recognition through electroencephalographic signals
- URL: http://arxiv.org/abs/2210.06472v1
- Date: Tue, 11 Oct 2022 08:29:12 GMT
- Title: Inner speech recognition through electroencephalographic signals
- Authors: Francesca Gasparini, Elisa Cazzaniga, Aurora Saibene
- Abstract summary: This work focuses on inner speech recognition starting from EEG signals.
The decoding of the EEG into text should be understood as the classification of a limited number of words (commands)
Speech-related BCIs provide effective vocal communication strategies for controlling devices through speech commands interpreted from brain signals.
- Score: 2.578242050187029
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work focuses on inner speech recognition starting from EEG signals.
Inner speech recognition is defined as the internalized process in which the
person thinks in pure meanings, generally associated with an auditory imagery
of own inner "voice". The decoding of the EEG into text should be understood as
the classification of a limited number of words (commands) or the presence of
phonemes (units of sound that make up words). Speech-related BCIs provide
effective vocal communication strategies for controlling devices through speech
commands interpreted from brain signals, improving the quality of life of
people who have lost the capability to speak, by restoring communication with
their environment. Two public inner speech datasets are analysed. Using this
data, some classification models are studied and implemented starting from
basic methods such as Support Vector Machines, to ensemble methods such as the
eXtreme Gradient Boosting classifier up to the use of neural networks such as
Long Short Term Memory (LSTM) and Bidirectional Long Short Term Memory
(BiLSTM). With the LSTM and BiLSTM models, generally not used in the literature
of inner speech recognition, results in line with or superior to those present
in the stateof-the-art are obtained.
Related papers
- Towards Unified Neural Decoding of Perceived, Spoken and Imagined Speech from EEG Signals [1.33134751838052]
This research investigated the effectiveness of deep learning models for non-invasive neural signal decoding.
It focused on distinguishing between different speech paradigms, including perceived, overt, whispered, and imagined speech.
arXiv Detail & Related papers (2024-11-14T07:20:08Z) - BrainECHO: Semantic Brain Signal Decoding through Vector-Quantized Spectrogram Reconstruction for Whisper-Enhanced Text Generation [29.78480739360263]
We propose a new multi-stage strategy for semantic brain signal decoding via vEctor-quantized speCtrogram reconstruction.
BrainECHO successively conducts: 1) autoencoding of the audio spectrogram; 2) Brain-audio latent space alignment; and 3) Semantic text generation via Whisper finetuning.
BrainECHO outperforms state-of-the-art methods under the same data split settings on two widely accepted resources.
arXiv Detail & Related papers (2024-10-19T04:29:03Z) - Learning Speech Representation From Contrastive Token-Acoustic
Pretraining [57.08426714676043]
We propose "Contrastive Token-Acoustic Pretraining (CTAP)", which uses two encoders to bring phoneme and speech into a joint multimodal space.
The proposed CTAP model is trained on 210k speech and phoneme pairs, achieving minimally-supervised TTS, VC, and ASR.
arXiv Detail & Related papers (2023-09-01T12:35:43Z) - Introducing Semantics into Speech Encoders [91.37001512418111]
We propose an unsupervised way of incorporating semantic information from large language models into self-supervised speech encoders without labeled audio transcriptions.
Our approach achieves similar performance as supervised methods trained on over 100 hours of labeled audio transcripts.
arXiv Detail & Related papers (2022-11-15T18:44:28Z) - Direction-Aware Joint Adaptation of Neural Speech Enhancement and
Recognition in Real Multiparty Conversational Environments [21.493664174262737]
This paper describes noisy speech recognition for an augmented reality headset that helps verbal communication within real multiparty conversational environments.
We propose a semi-supervised adaptation method that jointly updates the mask estimator and the ASR model at run-time using clean speech signals with ground-truth transcriptions and noisy speech signals with highly-confident estimated transcriptions.
arXiv Detail & Related papers (2022-07-15T03:43:35Z) - Streaming Multi-talker Speech Recognition with Joint Speaker
Identification [77.46617674133556]
SURIT employs the recurrent neural network transducer (RNN-T) as the backbone for both speech recognition and speaker identification.
We validate our idea on the Librispeech dataset -- a multi-talker dataset derived from Librispeech, and present encouraging results.
arXiv Detail & Related papers (2021-04-05T18:37:33Z) - The "Sound of Silence" in EEG -- Cognitive voice activity detection [22.196642357767338]
"Non-speech"(NS) state of brain activity corresponding to silence regions of speech audio is studied.
Speech perception is studied to inspect the existence of such a state, followed by its identification in speech imagination.
The recognition performance and the visual distinction observed demonstrates the existence of silence signatures in EEG.
arXiv Detail & Related papers (2020-10-12T07:47:36Z) - An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and
Separation [57.68765353264689]
Speech enhancement and speech separation are two related tasks.
Traditionally, these tasks have been tackled using signal processing and machine learning techniques.
Deep learning has been exploited to achieve strong performance.
arXiv Detail & Related papers (2020-08-21T17:24:09Z) - Understanding effect of speech perception in EEG based speech
recognition systems [3.5786621294068377]
The electroencephalography (EEG) signals recorded in parallel with speech are used to perform isolated and continuous speech recognition.
We investigate whether it is possible to separate out this speech perception component from EEG signals in order to design more robust EEG based speech recognition systems.
arXiv Detail & Related papers (2020-05-29T05:56:09Z) - Speech Enhancement using Self-Adaptation and Multi-Head Self-Attention [70.82604384963679]
This paper investigates a self-adaptation method for speech enhancement using auxiliary speaker-aware features.
We extract a speaker representation used for adaptation directly from the test utterance.
arXiv Detail & Related papers (2020-02-14T05:05:36Z) - Continuous speech separation: dataset and analysis [52.10378896407332]
In natural conversations, a speech signal is continuous, containing both overlapped and overlap-free components.
This paper describes a dataset and protocols for evaluating continuous speech separation algorithms.
arXiv Detail & Related papers (2020-01-30T18:01:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.