Synthesizing Speech from Intracranial Depth Electrodes using an
Encoder-Decoder Framework
- URL: http://arxiv.org/abs/2111.01457v1
- Date: Tue, 2 Nov 2021 09:43:21 GMT
- Title: Synthesizing Speech from Intracranial Depth Electrodes using an
Encoder-Decoder Framework
- Authors: Jonas Kohler, Maarten C. Ottenhoff, Sophocles Goulis, Miguel Angrick,
Albert J. Colon, Louis Wagner, Simon Tousseyn, Pieter L. Kubben, Christian
Herff
- Abstract summary: Speech Neuroprostheses have the potential to enable communication for people with dysarthria or anarthria.
Recent advances have demonstrated high-quality text decoding and speech synthesis from electrocorticographic grids placed on the cortical surface.
- Score: 1.623136488969658
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Speech Neuroprostheses have the potential to enable communication for people
with dysarthria or anarthria. Recent advances have demonstrated high-quality
text decoding and speech synthesis from electrocorticographic grids placed on
the cortical surface. Here, we investigate a less invasive measurement
modality, namely stereotactic EEG (sEEG) that provides sparse sampling from
multiple brain regions, including subcortical regions. To evaluate whether sEEG
can also be used to synthesize high-quality audio from neural recordings, we
employ a recurrent encoder-decoder framework based on modern deep learning
methods. We demonstrate that high-quality speech can be reconstructed from
these minimally invasive recordings, despite a limited amount of training data.
Finally, we utilize variational feature dropout to successfully identify the
most informative electrode contacts.
Related papers
- A multimodal LLM for the non-invasive decoding of spoken text from brain recordings [0.4187344935012482]
We propose and end-to-end multimodal LLM for decoding spoken text from fMRI signals.
The proposed architecture is founded on (i) an encoder derived from a specific transformer incorporating an augmented embedding layer for the encoder and a better-adjusted attention mechanism than that present in the state of the art.
A benchmark in performed on a corpus consisting of a set of interactions human-human and human-robot interactions where fMRI and conversational signals are recorded synchronously.
arXiv Detail & Related papers (2024-09-29T14:03:39Z) - CoLM-DSR: Leveraging Neural Codec Language Modeling for Multi-Modal Dysarthric Speech Reconstruction [61.067153685104394]
Dysarthric speech reconstruction (DSR) aims to transform dysarthric speech into normal speech.
It still suffers from low speaker similarity and poor prosody naturalness.
We propose a multi-modal DSR model by leveraging neural language modeling to improve the reconstruction results.
arXiv Detail & Related papers (2024-06-12T15:42:21Z) - Improving Speech Decoding from ECoG with Self-Supervised Pretraining [0.0]
We reengineering a self-supervised, fully convolutional model that learns latent representations of audio using a noise-contrastive loss.
We train this model on unlabelled electrocorticographic (ECoG) recordings.
We then use it to transform ECoG from labeled speech sessions into wav2vec's representation space, before finally training a supervised encoder-decoder to map these representations to text.
arXiv Detail & Related papers (2024-05-28T22:48:53Z) - Surrogate Gradient Spiking Neural Networks as Encoders for Large
Vocabulary Continuous Speech Recognition [91.39701446828144]
We show that spiking neural networks can be trained like standard recurrent neural networks using the surrogate gradient method.
They have shown promising results on speech command recognition tasks.
In contrast to their recurrent non-spiking counterparts, they show robustness to exploding gradient problems without the need to use gates.
arXiv Detail & Related papers (2022-12-01T12:36:26Z) - End-to-End Binaural Speech Synthesis [71.1869877389535]
We present an end-to-end speech synthesis system that combines a low-bitrate audio system with a powerful decoder.
We demonstrate the capability of the adversarial loss in capturing environment effects needed to create an authentic auditory scene.
arXiv Detail & Related papers (2022-07-08T05:18:36Z) - Avocodo: Generative Adversarial Network for Artifact-free Vocoder [5.956832212419584]
We propose a GAN-based neural vocoder, called Avocodo, that allows the synthesis of high-fidelity speech with reduced artifacts.
Avocodo outperforms conventional GAN-based neural vocoders in both speech and singing voice synthesis tasks and can synthesize artifact-free speech.
arXiv Detail & Related papers (2022-06-27T15:54:41Z) - Open Vocabulary Electroencephalography-To-Text Decoding and Zero-shot
Sentiment Classification [78.120927891455]
State-of-the-art brain-to-text systems have achieved great success in decoding language directly from brain signals using neural networks.
In this paper, we extend the problem to open vocabulary Electroencephalography(EEG)-To-Text Sequence-To-Sequence decoding and zero-shot sentence sentiment classification on natural reading tasks.
Our model achieves a 40.1% BLEU-1 score on EEG-To-Text decoding and a 55.6% F1 score on zero-shot EEG-based ternary sentiment classification, which significantly outperforms supervised baselines.
arXiv Detail & Related papers (2021-12-05T21:57:22Z) - DeepA: A Deep Neural Analyzer For Speech And Singing Vocoding [71.73405116189531]
We propose a neural vocoder that extracts F0 and timbre/aperiodicity encoding from the input speech that emulates those defined in conventional vocoders.
As the deep neural analyzer is learnable, it is expected to be more accurate for signal reconstruction and manipulation, and generalizable from speech to singing.
arXiv Detail & Related papers (2021-10-13T01:39:57Z) - Diffusion-Weighted Magnetic Resonance Brain Images Generation with
Generative Adversarial Networks and Variational Autoencoders: A Comparison
Study [55.78588835407174]
We show that high quality, diverse and realistic-looking diffusion-weighted magnetic resonance images can be synthesized using deep generative models.
We present two networks, the Introspective Variational Autoencoder and the Style-Based GAN, that qualify for data augmentation in the medical field.
arXiv Detail & Related papers (2020-06-24T18:00:01Z) - A Novel Deep Learning Architecture for Decoding Imagined Speech from EEG [2.4063592468412267]
We present a novel architecture that employs deep neural network (DNN) for classifying the words "in" and "cooperate"
Nine EEG channels, which best capture the underlying cortical activity, are chosen using common spatial pattern.
We have achieved accuracies comparable to the state-of-the-art results.
arXiv Detail & Related papers (2020-03-19T00:57:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.