End-to-end translation of human neural activity to speech with a
dual-dual generative adversarial network
- URL: http://arxiv.org/abs/2110.06634v1
- Date: Wed, 13 Oct 2021 10:54:41 GMT
- Title: End-to-end translation of human neural activity to speech with a
dual-dual generative adversarial network
- Authors: Yina Guo, Xiaofei Zhang, Zhenying Gong, Anhong Wang and Wenwu Wang
- Abstract summary: We propose an end-to-end model to translate human neural activity to speech directly.
We create a new electroencephalogram (EEG) datasets for participants with good attention.
The proposed method can translate word-length and sentence-length sequences of neural activity to speech.
- Score: 39.014888541156296
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In a recent study of auditory evoked potential (AEP) based brain-computer
interface (BCI), it was shown that, with an encoder-decoder framework, it is
possible to translate human neural activity to speech (T-CAS). However, current
encoder-decoder-based methods achieve T-CAS often with a two-step method where
the information is passed between the encoder and decoder with a shared
dimension reduction vector, which may result in a loss of information. A
potential approach to this problem is to design an end-to-end method by using a
dual generative adversarial network (DualGAN) without dimension reduction of
passing information, but it cannot realize one-to-one signal-to-signal
translation (see Fig.1 (a) and (b)). In this paper, we propose an end-to-end
model to translate human neural activity to speech directly, create a new
electroencephalogram (EEG) datasets for participants with good attention by
design a device to detect participants' attention, and introduce a dual-dual
generative adversarial network (Dual-DualGAN) (see Fig. 1 (c) and (d)) to
address an end-to-end translation of human neural activity to speech (ET-CAS)
problem by group labelling EEG signals and speech signals, inserting a
transition domain to realize cross-domain mapping. In the transition domain,
the transition signals are cascaded by the corresponding EEG and speech signals
in a certain proportion, which can build bridges for EEG and speech signals
without corresponding features, and realize one-to-one cross-domain
EEG-to-speech translation. The proposed method can translate word-length and
sentence-length sequences of neural activity to speech. Experimental evaluation
has been conducted to show that the proposed method significantly outperforms
state-of-the-art methods on both words and sentences of auditory stimulus.
Related papers
- Enhancing Listened Speech Decoding from EEG via Parallel Phoneme Sequence Prediction [36.38186261968484]
We propose a novel approach to enhance listened speech decoding from electroencephalography (EEG) signals.
We use an auxiliary phoneme predictor that simultaneously decodes textual phoneme sequences.
arXiv Detail & Related papers (2025-01-08T21:11:35Z) - Towards Linguistic Neural Representation Learning and Sentence Retrieval from Electroencephalogram Recordings [27.418738450536047]
We propose a two-step pipeline for converting EEG signals into sentences.
We first confirm that word-level semantic information can be learned from EEG data recorded during natural reading.
We employ a training-free retrieval method to retrieve sentences based on the predictions from the EEG encoder.
arXiv Detail & Related papers (2024-08-08T03:40:25Z) - EEGDiR: Electroencephalogram denoising network for temporal information storage and global modeling through Retentive Network [11.491355463353731]
We introduce the Retnet from natural language processing to EEG denoising.
Direct application of Retnet to EEG denoising is unfeasible due to the one-dimensional nature of EEG signals.
We propose the signal embedding method, transforming one-dimensional EEG signals into two dimensions for use as network inputs.
arXiv Detail & Related papers (2024-03-20T15:04:21Z) - Diff-E: Diffusion-based Learning for Decoding Imagined Speech EEG [17.96977778655143]
We propose a novel method for decoding EEG signals for imagined speech using DDPMs and a conditional autoencoder named Diff-E.
Results indicate that Diff-E significantly improves the accuracy of decoding EEG signals for imagined speech compared to traditional machine learning techniques and baseline models.
arXiv Detail & Related papers (2023-07-26T07:12:39Z) - LAMASSU: Streaming Language-Agnostic Multilingual Speech Recognition and
Translation Using Neural Transducers [71.76680102779765]
Automatic speech recognition (ASR) and speech translation (ST) can both use neural transducers as the model structure.
We propose LAMASSU, a streaming language-agnostic multilingual speech recognition and translation model using neural transducers.
arXiv Detail & Related papers (2022-11-05T04:03:55Z) - Wav2Seq: Pre-training Speech-to-Text Encoder-Decoder Models Using Pseudo
Languages [58.43299730989809]
We introduce Wav2Seq, the first self-supervised approach to pre-train both parts of encoder-decoder models for speech data.
We induce a pseudo language as a compact discrete representation, and formulate a self-supervised pseudo speech recognition task.
This process stands on its own, or can be applied as low-cost second-stage pre-training.
arXiv Detail & Related papers (2022-05-02T17:59:02Z) - Speaker Embedding-aware Neural Diarization: a Novel Framework for
Overlapped Speech Diarization in the Meeting Scenario [51.5031673695118]
We reformulate overlapped speech diarization as a single-label prediction problem.
We propose the speaker embedding-aware neural diarization (SEND) system.
arXiv Detail & Related papers (2022-03-18T06:40:39Z) - Cross-Modality Brain Tumor Segmentation via Bidirectional
Global-to-Local Unsupervised Domain Adaptation [61.01704175938995]
In this paper, we propose a novel Bidirectional Global-to-Local (BiGL) adaptation framework under a UDA scheme.
Specifically, a bidirectional image synthesis and segmentation module is proposed to segment the brain tumor.
The proposed method outperforms several state-of-the-art unsupervised domain adaptation methods by a large margin.
arXiv Detail & Related papers (2021-05-17T10:11:45Z) - Correlation based Multi-phasal models for improved imagined speech EEG
recognition [22.196642357767338]
This work aims to profit from the parallel information contained in multi-phasal EEG data recorded while speaking, imagining and performing articulatory movements corresponding to specific speech units.
A bi-phase common representation learning module using neural networks is designed to model the correlation and between an analysis phase and a support phase.
The proposed approach further handles the non-availability of multi-phasal data during decoding.
arXiv Detail & Related papers (2020-11-04T09:39:53Z) - Class-Conditional Defense GAN Against End-to-End Speech Attacks [82.21746840893658]
We propose a novel approach against end-to-end adversarial attacks developed to fool advanced speech-to-text systems such as DeepSpeech and Lingvo.
Unlike conventional defense approaches, the proposed approach does not directly employ low-level transformations such as autoencoding a given input signal.
Our defense-GAN considerably outperforms conventional defense algorithms in terms of word error rate and sentence level recognition accuracy.
arXiv Detail & Related papers (2020-10-22T00:02:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.