Predicting Different Acoustic Features from EEG and towards direct
synthesis of Audio Waveform from EEG
- URL: http://arxiv.org/abs/2006.01262v1
- Date: Fri, 29 May 2020 05:50:03 GMT
- Title: Predicting Different Acoustic Features from EEG and towards direct
synthesis of Audio Waveform from EEG
- Authors: Gautam Krishna, Co Tran, Mason Carnahan, Ahmed Tewfik
- Abstract summary: Authors provided preliminary results for synthesizing speech from electroencephalography (EEG) features.
Deep learning model takes raw EEG waveform signals as input and directly produces audio waveform as output.
Results presented in this paper shows how different acoustic features are related to non-invasive neural EEG signals recorded during speech perception and production.
- Score: 3.5786621294068377
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In [1,2] authors provided preliminary results for synthesizing speech from
electroencephalography (EEG) features where they first predict acoustic
features from EEG features and then the speech is reconstructed from the
predicted acoustic features using griffin lim reconstruction algorithm. In this
paper we first introduce a deep learning model that takes raw EEG waveform
signals as input and directly produces audio waveform as output. We then
demonstrate predicting 16 different acoustic features from EEG features. We
demonstrate our results for both spoken and listen condition in this paper. The
results presented in this paper shows how different acoustic features are
related to non-invasive neural EEG signals recorded during speech perception
and production.
Related papers
- NeuroSpex: Neuro-Guided Speaker Extraction with Cross-Modal Attention [47.8479647938849]
We present a neuro-guided speaker extraction model, i.e. NeuroSpex, using the EEG response of the listener as the sole auxiliary reference cue.
We propose a novel EEG signal encoder that captures the attention information. Additionally, we propose a cross-attention (CA) mechanism to enhance the speech feature representations.
arXiv Detail & Related papers (2024-09-04T07:33:01Z) - DGSD: Dynamical Graph Self-Distillation for EEG-Based Auditory Spatial
Attention Detection [49.196182908826565]
Auditory Attention Detection (AAD) aims to detect target speaker from brain signals in a multi-speaker environment.
Current approaches primarily rely on traditional convolutional neural network designed for processing Euclidean data like images.
This paper proposes a dynamical graph self-distillation (DGSD) approach for AAD, which does not require speech stimuli as input.
arXiv Detail & Related papers (2023-09-07T13:43:46Z) - Fully Automated End-to-End Fake Audio Detection [57.78459588263812]
This paper proposes a fully automated end-toend fake audio detection method.
We first use wav2vec pre-trained model to obtain a high-level representation of the speech.
For the network structure, we use a modified version of the differentiable architecture search (DARTS) named light-DARTS.
arXiv Detail & Related papers (2022-08-20T06:46:55Z) - Exploiting Cross Domain Acoustic-to-articulatory Inverted Features For
Disordered Speech Recognition [57.15942628305797]
Articulatory features are invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition systems for normal speech.
This paper presents a cross-domain acoustic-to-articulatory (A2A) inversion approach that utilizes the parallel acoustic-articulatory data of the 15-hour TORGO corpus in model training.
Cross-domain adapted to the 102.7-hour UASpeech corpus and to produce articulatory features.
arXiv Detail & Related papers (2022-03-19T08:47:18Z) - EEGminer: Discovering Interpretable Features of Brain Activity with
Learnable Filters [72.19032452642728]
We propose a novel differentiable EEG decoding pipeline consisting of learnable filters and a pre-determined feature extraction module.
We demonstrate the utility of our model towards emotion recognition from EEG signals on the SEED dataset and on a new EEG dataset of unprecedented size.
The discovered features align with previous neuroscience studies and offer new insights, such as marked differences in the functional connectivity profile between left and right temporal areas during music listening.
arXiv Detail & Related papers (2021-10-19T14:22:04Z) - End-to-End Video-To-Speech Synthesis using Generative Adversarial
Networks [54.43697805589634]
We propose a new end-to-end video-to-speech model based on Generative Adversarial Networks (GANs)
Our model consists of an encoder-decoder architecture that receives raw video as input and generates speech.
We show that this model is able to reconstruct speech with remarkable realism for constrained datasets such as GRID.
arXiv Detail & Related papers (2021-04-27T17:12:30Z) - Constrained Variational Autoencoder for improving EEG based Speech
Recognition Systems [3.5786621294068377]
We introduce a recurrent neural network (RNN) based variational autoencoder (VAE) model with a new constrained loss function.
We demonstrate that both continuous and isolated speech recognition systems trained and tested using EEG features generated from raw EEG features.
arXiv Detail & Related papers (2020-06-01T06:03:50Z) - Understanding effect of speech perception in EEG based speech
recognition systems [3.5786621294068377]
The electroencephalography (EEG) signals recorded in parallel with speech are used to perform isolated and continuous speech recognition.
We investigate whether it is possible to separate out this speech perception component from EEG signals in order to design more robust EEG based speech recognition systems.
arXiv Detail & Related papers (2020-05-29T05:56:09Z) - Advancing Speech Synthesis using EEG [3.5786621294068377]
We introduce attention-regression model to demonstrate predicting acoustic features from electroencephalography (EEG) features recorded in parallel with spoken sentences.
First we demonstrate predicting acoustic features directly from EEG features using our attention model and then we demonstrate predicting acoustic features from EEG features using a two-step approach.
arXiv Detail & Related papers (2020-04-09T23:58:40Z) - Generating EEG features from Acoustic features [13.089515271477824]
We use recurrent neural network (RNN) based regression model and generative adversarial network (GAN) to predict EEG features from acoustic features.
We compare our results with the previously studied problem on speech synthesis using EEG.
arXiv Detail & Related papers (2020-02-29T16:44:08Z) - Speech Synthesis using EEG [4.312746668772343]
We make use of a recurrent neural network (RNN) regression model to predict acoustic features directly from EEG features.
We provide EEG based speech synthesis results for four subjects in this paper.
arXiv Detail & Related papers (2020-02-22T03:53:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.