Advancing Speech Synthesis using EEG
- URL: http://arxiv.org/abs/2004.04731v2
- Date: Sun, 3 May 2020 20:33:36 GMT
- Title: Advancing Speech Synthesis using EEG
- Authors: Gautam Krishna, Co Tran, Mason Carnahan, Ahmed Tewfik
- Abstract summary: We introduce attention-regression model to demonstrate predicting acoustic features from electroencephalography (EEG) features recorded in parallel with spoken sentences.
First we demonstrate predicting acoustic features directly from EEG features using our attention model and then we demonstrate predicting acoustic features from EEG features using a two-step approach.
- Score: 3.5786621294068377
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we introduce attention-regression model to demonstrate
predicting acoustic features from electroencephalography (EEG) features
recorded in parallel with spoken sentences. First we demonstrate predicting
acoustic features directly from EEG features using our attention model and then
we demonstrate predicting acoustic features from EEG features using a two-step
approach where in the first step we use our attention model to predict
articulatory features from EEG features and then in second step another
attention-regression model is trained to transform the predicted articulatory
features to acoustic features. Our proposed attention-regression model
demonstrates superior performance compared to the regression model introduced
by authors in [1] when tested using their data set for majority of the subjects
during test time. The results presented in this paper further advances the work
described by authors in [1].
Related papers
- NeuroSpex: Neuro-Guided Speaker Extraction with Cross-Modal Attention [47.8479647938849]
We present a neuro-guided speaker extraction model, i.e. NeuroSpex, using the EEG response of the listener as the sole auxiliary reference cue.
We propose a novel EEG signal encoder that captures the attention information. Additionally, we propose a cross-attention (CA) mechanism to enhance the speech feature representations.
arXiv Detail & Related papers (2024-09-04T07:33:01Z) - DGSD: Dynamical Graph Self-Distillation for EEG-Based Auditory Spatial
Attention Detection [49.196182908826565]
Auditory Attention Detection (AAD) aims to detect target speaker from brain signals in a multi-speaker environment.
Current approaches primarily rely on traditional convolutional neural network designed for processing Euclidean data like images.
This paper proposes a dynamical graph self-distillation (DGSD) approach for AAD, which does not require speech stimuli as input.
arXiv Detail & Related papers (2023-09-07T13:43:46Z) - Pathologies of Pre-trained Language Models in Few-shot Fine-tuning [50.3686606679048]
We show that pre-trained language models with few examples show strong prediction bias across labels.
Although few-shot fine-tuning can mitigate the prediction bias, our analysis shows models gain performance improvement by capturing non-task-related features.
These observations alert that pursuing model performance with fewer examples may incur pathological prediction behavior.
arXiv Detail & Related papers (2022-04-17T15:55:18Z) - DriPP: Driven Point Processes to Model Stimuli Induced Patterns in M/EEG
Signals [62.997667081978825]
We develop a novel statistical point process model-called driven temporal point processes (DriPP)
We derive a fast and principled expectation-maximization (EM) algorithm to estimate the parameters of this model.
Results on standard MEG datasets demonstrate that our methodology reveals event-related neural responses.
arXiv Detail & Related papers (2021-12-08T13:07:21Z) - Improving End-To-End Modeling for Mispronunciation Detection with
Effective Augmentation Mechanisms [17.317583079824423]
We propose two strategies to enhance the discrimination capability of E2E MD models.
One is input augmentation, which aims to distill knowledge about phonetic discrimination from a DNN-HMM acoustic model.
The other is label augmentation, which manages to capture more phonological patterns from the transcripts of training data.
arXiv Detail & Related papers (2021-10-17T06:11:15Z) - Energy-based View of Retrosynthesis [70.66156081030766]
We propose a framework that unifies sequence- and graph-based methods as energy-based models.
We present a novel dual variant within the framework that performs consistent training over Bayesian forward- and backward-prediction.
This model improves state-of-the-art performance by 9.6% for template-free approaches where the reaction type is unknown.
arXiv Detail & Related papers (2020-07-14T18:51:06Z) - Predicting Different Acoustic Features from EEG and towards direct
synthesis of Audio Waveform from EEG [3.5786621294068377]
Authors provided preliminary results for synthesizing speech from electroencephalography (EEG) features.
Deep learning model takes raw EEG waveform signals as input and directly produces audio waveform as output.
Results presented in this paper shows how different acoustic features are related to non-invasive neural EEG signals recorded during speech perception and production.
arXiv Detail & Related papers (2020-05-29T05:50:03Z) - Generating EEG features from Acoustic features [13.089515271477824]
We use recurrent neural network (RNN) based regression model and generative adversarial network (GAN) to predict EEG features from acoustic features.
We compare our results with the previously studied problem on speech synthesis using EEG.
arXiv Detail & Related papers (2020-02-29T16:44:08Z) - Speech Synthesis using EEG [4.312746668772343]
We make use of a recurrent neural network (RNN) regression model to predict acoustic features directly from EEG features.
We provide EEG based speech synthesis results for four subjects in this paper.
arXiv Detail & Related papers (2020-02-22T03:53:45Z) - Generating diverse and natural text-to-speech samples using a quantized
fine-grained VAE and auto-regressive prosody prior [53.69310441063162]
This paper proposes a sequential prior in a discrete latent space which can generate more naturally sounding samples.
We evaluate the approach using listening tests, objective metrics of automatic speech recognition (ASR) performance, and measurements of prosody attributes.
arXiv Detail & Related papers (2020-02-06T12:35:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.