Neural Speech Embeddings for Speech Synthesis Based on Deep Generative
Networks
- URL: http://arxiv.org/abs/2312.05814v2
- Date: Tue, 27 Feb 2024 02:25:28 GMT
- Title: Neural Speech Embeddings for Speech Synthesis Based on Deep Generative
Networks
- Authors: Seo-Hyun Lee, Young-Eun Lee, Soowon Kim, Byung-Kwan Ko, Jun-Young Kim,
Seong-Whan Lee
- Abstract summary: We introduce the current brain-to-speech technology with the possibility of speech synthesis from brain signals.
Also, we perform comprehensive analysis on the neural features and neural speech embeddings underlying the neurophysiological activation while performing speech.
- Score: 27.64740032872726
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Brain-to-speech technology represents a fusion of interdisciplinary
applications encompassing fields of artificial intelligence, brain-computer
interfaces, and speech synthesis. Neural representation learning based
intention decoding and speech synthesis directly connects the neural activity
to the means of human linguistic communication, which may greatly enhance the
naturalness of communication. With the current discoveries on representation
learning and the development of the speech synthesis technologies, direct
translation of brain signals into speech has shown great promise. Especially,
the processed input features and neural speech embeddings which are given to
the neural network play a significant role in the overall performance when
using deep generative models for speech generation from brain signals. In this
paper, we introduce the current brain-to-speech technology with the possibility
of speech synthesis from brain signals, which may ultimately facilitate
innovation in non-verbal communication. Also, we perform comprehensive analysis
on the neural features and neural speech embeddings underlying the
neurophysiological activation while performing speech, which may play a
significant role in the speech synthesis works.
Related papers
- Towards Unified Neural Decoding of Perceived, Spoken and Imagined Speech from EEG Signals [1.33134751838052]
This research investigated the effectiveness of deep learning models for non-invasive neural signal decoding.
It focused on distinguishing between different speech paradigms, including perceived, overt, whispered, and imagined speech.
arXiv Detail & Related papers (2024-11-14T07:20:08Z) - Dynamic Neural Communication: Convergence of Computer Vision and Brain-Computer Interface [25.555303640695577]
We introduce a dynamic neural communication method that leverages computer vision and brain-computer interface technologies.
Our approach captures the user's intentions from neural signals and decodes visemes in short time steps to produce dynamic visual outputs.
Results demonstrate the potential to rapidly capture and reconstruct lip movements during natural speech attempts from human neural signals.
arXiv Detail & Related papers (2024-11-14T06:15:05Z) - Brain-like Functional Organization within Large Language Models [58.93629121400745]
The human brain has long inspired the pursuit of artificial intelligence (AI)
Recent neuroimaging studies provide compelling evidence of alignment between the computational representation of artificial neural networks (ANNs) and the neural responses of the human brain to stimuli.
In this study, we bridge this gap by directly coupling sub-groups of artificial neurons with functional brain networks (FBNs)
This framework links the AN sub-groups to FBNs, enabling the delineation of brain-like functional organization within large language models (LLMs)
arXiv Detail & Related papers (2024-10-25T13:15:17Z) - SIFToM: Robust Spoken Instruction Following through Theory of Mind [51.326266354164716]
We present a cognitively inspired model, Speech Instruction Following through Theory of Mind (SIFToM), to enable robots to pragmatically follow human instructions under diverse speech conditions.
Results show that the SIFToM model outperforms state-of-the-art speech and language models, approaching human-level accuracy on challenging speech instruction following tasks.
arXiv Detail & Related papers (2024-09-17T02:36:10Z) - Enhancing learning in spiking neural networks through neuronal heterogeneity and neuromodulatory signaling [52.06722364186432]
We propose a biologically-informed framework for enhancing artificial neural networks (ANNs)
Our proposed dual-framework approach highlights the potential of spiking neural networks (SNNs) for emulating diverse spiking behaviors.
We outline how the proposed approach integrates brain-inspired compartmental models and task-driven SNNs, bioinspiration and complexity.
arXiv Detail & Related papers (2024-07-05T14:11:28Z) - Exploring neural oscillations during speech perception via surrogate gradient spiking neural networks [59.38765771221084]
We present a physiologically inspired speech recognition architecture compatible and scalable with deep learning frameworks.
We show end-to-end gradient descent training leads to the emergence of neural oscillations in the central spiking neural network.
Our findings highlight the crucial inhibitory role of feedback mechanisms, such as spike frequency adaptation and recurrent connections, in regulating and synchronising neural activity to improve recognition performance.
arXiv Detail & Related papers (2024-04-22T09:40:07Z) - Towards Decoding Brain Activity During Passive Listening of Speech [0.0]
We attempt to decode heard speech from intracranial electroencephalographic (iEEG) data using deep learning methods.
This approach diverges from the conventional focus on speech production and instead chooses to investigate neural representations of perceived speech.
Despite the approach not having achieved a breakthrough yet, the research sheds light on the potential of decoding neural activity during speech perception.
arXiv Detail & Related papers (2024-02-26T20:04:01Z) - BrainBERT: Self-supervised representation learning for intracranial
recordings [18.52962864519609]
We create a reusable Transformer, BrainBERT, for intracranial recordings bringing modern representation learning approaches to neuroscience.
Much like in NLP and speech recognition, this Transformer enables classifying complex concepts, with higher accuracy and with much less data.
In the future, far more concepts will be decodable from neural recordings by using representation learning, potentially unlocking the brain like language models unlocked language.
arXiv Detail & Related papers (2023-02-28T07:40:37Z) - Toward a realistic model of speech processing in the brain with
self-supervised learning [67.7130239674153]
Self-supervised algorithms trained on the raw waveform constitute a promising candidate.
We show that Wav2Vec 2.0 learns brain-like representations with as little as 600 hours of unlabelled speech.
arXiv Detail & Related papers (2022-06-03T17:01:46Z) - Towards efficient end-to-end speech recognition with
biologically-inspired neural networks [10.457580011403289]
We introduce neural connectivity concepts emulating the axo-somatic and the axo-axonic synapses.
We demonstrate for the first time, that a biologically realistic implementation of a large-scale ASR model can yield competitive performance levels.
arXiv Detail & Related papers (2021-10-04T21:24:10Z) - SpeechBrain: A General-Purpose Speech Toolkit [73.0404642815335]
SpeechBrain is an open-source and all-in-one speech toolkit.
It is designed to facilitate the research and development of neural speech processing technologies.
It achieves competitive or state-of-the-art performance in a wide range of speech benchmarks.
arXiv Detail & Related papers (2021-06-08T18:22:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.