Speech Synthesis using EEG
- URL: http://arxiv.org/abs/2002.12756v2
- Date: Sun, 3 May 2020 20:30:33 GMT
- Title: Speech Synthesis using EEG
- Authors: Gautam Krishna, Co Tran, Yan Han, Mason Carnahan
- Abstract summary: We make use of a recurrent neural network (RNN) regression model to predict acoustic features directly from EEG features.
We provide EEG based speech synthesis results for four subjects in this paper.
- Score: 4.312746668772343
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we demonstrate speech synthesis using different
electroencephalography (EEG) feature sets recently introduced in [1]. We make
use of a recurrent neural network (RNN) regression model to predict acoustic
features directly from EEG features. We demonstrate our results using EEG
features recorded in parallel with spoken speech as well as using EEG recorded
in parallel with listening utterances. We provide EEG based speech synthesis
results for four subjects in this paper and our results demonstrate the
feasibility of synthesizing speech directly from EEG features.
Related papers
- NeuroSpex: Neuro-Guided Speaker Extraction with Cross-Modal Attention [47.8479647938849]
We present a neuro-guided speaker extraction model, i.e. NeuroSpex, using the EEG response of the listener as the sole auxiliary reference cue.
We propose a novel EEG signal encoder that captures the attention information. Additionally, we propose a cross-attention (CA) mechanism to enhance the speech feature representations.
arXiv Detail & Related papers (2024-09-04T07:33:01Z) - Enhancing EEG-to-Text Decoding through Transferable Representations from Pre-trained Contrastive EEG-Text Masked Autoencoder [69.7813498468116]
We propose Contrastive EEG-Text Masked Autoencoder (CET-MAE), a novel model that orchestrates compound self-supervised learning across and within EEG and text.
We also develop a framework called E2T-PTR (EEG-to-Text decoding using Pretrained Transferable Representations) to decode text from EEG sequences.
arXiv Detail & Related papers (2024-02-27T11:45:21Z) - Discretization and Re-synthesis: an alternative method to solve the
Cocktail Party Problem [65.25725367771075]
This study demonstrates, for the first time, that the synthesis-based approach can also perform well on this problem.
Specifically, we propose a novel speech separation/enhancement model based on the recognition of discrete symbols.
By utilizing the synthesis model with the input of discrete symbols, after the prediction of discrete symbol sequence, each target speech could be re-synthesized.
arXiv Detail & Related papers (2021-12-17T08:35:40Z) - EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional
Text-to-Speech Model [56.75775793011719]
We introduce and publicly release a Mandarin emotion speech dataset including 9,724 samples with audio files and its emotion human-labeled annotation.
Unlike those models which need additional reference audio as input, our model could predict emotion labels just from the input text and generate more expressive speech conditioned on the emotion embedding.
In the experiment phase, we first validate the effectiveness of our dataset by an emotion classification task. Then we train our model on the proposed dataset and conduct a series of subjective evaluations.
arXiv Detail & Related papers (2021-06-17T08:34:21Z) - Reinforcement Learning for Emotional Text-to-Speech Synthesis with
Improved Emotion Discriminability [82.39099867188547]
Emotional text-to-speech synthesis (ETTS) has seen much progress in recent years.
We propose a new interactive training paradigm for ETTS, denoted as i-ETTS.
We formulate an iterative training strategy with reinforcement learning to ensure the quality of i-ETTS optimization.
arXiv Detail & Related papers (2021-04-03T13:52:47Z) - Constrained Variational Autoencoder for improving EEG based Speech
Recognition Systems [3.5786621294068377]
We introduce a recurrent neural network (RNN) based variational autoencoder (VAE) model with a new constrained loss function.
We demonstrate that both continuous and isolated speech recognition systems trained and tested using EEG features generated from raw EEG features.
arXiv Detail & Related papers (2020-06-01T06:03:50Z) - Understanding effect of speech perception in EEG based speech
recognition systems [3.5786621294068377]
The electroencephalography (EEG) signals recorded in parallel with speech are used to perform isolated and continuous speech recognition.
We investigate whether it is possible to separate out this speech perception component from EEG signals in order to design more robust EEG based speech recognition systems.
arXiv Detail & Related papers (2020-05-29T05:56:09Z) - Predicting Different Acoustic Features from EEG and towards direct
synthesis of Audio Waveform from EEG [3.5786621294068377]
Authors provided preliminary results for synthesizing speech from electroencephalography (EEG) features.
Deep learning model takes raw EEG waveform signals as input and directly produces audio waveform as output.
Results presented in this paper shows how different acoustic features are related to non-invasive neural EEG signals recorded during speech perception and production.
arXiv Detail & Related papers (2020-05-29T05:50:03Z) - End-to-end Named Entity Recognition from English Speech [51.22888702264816]
We introduce a first publicly available NER annotated dataset for English speech and present an E2E approach, which jointly optimize the ASR and NER tagger components.
We also discuss how NER from speech can be used to handle out of vocabulary (OOV) words in an ASR system.
arXiv Detail & Related papers (2020-05-22T13:39:14Z) - Advancing Speech Synthesis using EEG [3.5786621294068377]
We introduce attention-regression model to demonstrate predicting acoustic features from electroencephalography (EEG) features recorded in parallel with spoken sentences.
First we demonstrate predicting acoustic features directly from EEG features using our attention model and then we demonstrate predicting acoustic features from EEG features using a two-step approach.
arXiv Detail & Related papers (2020-04-09T23:58:40Z) - Generating EEG features from Acoustic features [13.089515271477824]
We use recurrent neural network (RNN) based regression model and generative adversarial network (GAN) to predict EEG features from acoustic features.
We compare our results with the previously studied problem on speech synthesis using EEG.
arXiv Detail & Related papers (2020-02-29T16:44:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.