Constrained Variational Autoencoder for improving EEG based Speech
Recognition Systems
- URL: http://arxiv.org/abs/2006.02902v1
- Date: Mon, 1 Jun 2020 06:03:50 GMT
- Title: Constrained Variational Autoencoder for improving EEG based Speech
Recognition Systems
- Authors: Gautam Krishna, Co Tran, Mason Carnahan, Ahmed Tewfik
- Abstract summary: We introduce a recurrent neural network (RNN) based variational autoencoder (VAE) model with a new constrained loss function.
We demonstrate that both continuous and isolated speech recognition systems trained and tested using EEG features generated from raw EEG features.
- Score: 3.5786621294068377
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we introduce a recurrent neural network (RNN) based variational
autoencoder (VAE) model with a new constrained loss function that can generate
more meaningful electroencephalography (EEG) features from raw EEG features to
improve the performance of EEG based speech recognition systems. We demonstrate
that both continuous and isolated speech recognition systems trained and tested
using EEG features generated from raw EEG features using our VAE model results
in improved performance and we demonstrate our results for a limited English
vocabulary consisting of 30 unique sentences for continuous speech recognition
and for an English vocabulary consisting of 2 unique sentences for isolated
speech recognition. We compare our method with another recently introduced
method described by authors in [1] to improve the performance of EEG based
continuous speech recognition systems and we demonstrate that our method
outperforms their method as vocabulary size increases when trained and tested
using the same data set. Even though we demonstrate results only for automatic
speech recognition (ASR) experiments in this paper, the proposed VAE model with
constrained loss function can be extended to a variety of other EEG based brain
computer interface (BCI) applications.
Related papers
- Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition [110.8431434620642]
We introduce the generative speech transcription error correction (GenSEC) challenge.
This challenge comprises three post-ASR language modeling tasks: (i) post-ASR transcription correction, (ii) speaker tagging, and (iii) emotion recognition.
We discuss insights from baseline evaluations, as well as lessons learned for designing future evaluations.
arXiv Detail & Related papers (2024-09-15T16:32:49Z) - Improved Contextual Recognition In Automatic Speech Recognition Systems
By Semantic Lattice Rescoring [4.819085609772069]
We propose a novel approach for enhancing contextual recognition within ASR systems via semantic lattice processing.
Our solution consists of using Hidden Markov Models and Gaussian Mixture Models (HMM-GMM) along with Deep Neural Networks (DNN) models for better accuracy.
We demonstrate the effectiveness of our proposed framework on the LibriSpeech dataset with empirical analyses.
arXiv Detail & Related papers (2023-10-14T23:16:05Z) - Contextual-Utterance Training for Automatic Speech Recognition [65.4571135368178]
We propose a contextual-utterance training technique which makes use of the previous and future contextual utterances.
Also, we propose a dual-mode contextual-utterance training technique for streaming automatic speech recognition (ASR) systems.
The proposed technique is able to reduce both the WER and the average last token emission latency by more than 6% and 40ms relative.
arXiv Detail & Related papers (2022-10-27T08:10:44Z) - Audio-visual multi-channel speech separation, dereverberation and
recognition [70.34433820322323]
This paper proposes an audio-visual multi-channel speech separation, dereverberation and recognition approach.
The advantage of the additional visual modality over using audio only is demonstrated on two neural dereverberation approaches.
Experiments conducted on the LRS2 dataset suggest that the proposed audio-visual multi-channel speech separation, dereverberation and recognition system outperforms the baseline.
arXiv Detail & Related papers (2022-04-05T04:16:03Z) - Exploiting Cross Domain Acoustic-to-articulatory Inverted Features For
Disordered Speech Recognition [57.15942628305797]
Articulatory features are invariant to acoustic signal distortion and have been successfully incorporated into automatic speech recognition systems for normal speech.
This paper presents a cross-domain acoustic-to-articulatory (A2A) inversion approach that utilizes the parallel acoustic-articulatory data of the 15-hour TORGO corpus in model training.
Cross-domain adapted to the 102.7-hour UASpeech corpus and to produce articulatory features.
arXiv Detail & Related papers (2022-03-19T08:47:18Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - Factorized Neural Transducer for Efficient Language Model Adaptation [51.81097243306204]
We propose a novel model, factorized neural Transducer, by factorizing the blank and vocabulary prediction.
It is expected that this factorization can transfer the improvement of the standalone language model to the Transducer for speech recognition.
We demonstrate that the proposed factorized neural Transducer yields 15% to 20% WER improvements when out-of-domain text data is used for language model adaptation.
arXiv Detail & Related papers (2021-09-27T15:04:00Z) - Brain Signals to Rescue Aphasia, Apraxia and Dysarthria Speech
Recognition [14.544989316741091]
We propose a deep learning-based algorithm to improve the performance of automatic speech recognition systems for aphasia, apraxia, and dysarthria speech.
We demonstrate a significant decoding performance improvement by more than 50% during test time for isolated speech recognition task.
Results show the first step towards demonstrating the possibility of utilizing non-invasive neural signals to design a real-time robust speech prosthetic for stroke survivors recovering from aphasia, apraxia, and dysarthria.
arXiv Detail & Related papers (2021-02-28T03:27:02Z) - Improving EEG based continuous speech recognition using GAN [3.5786621294068377]
We demonstrate that it is possible to generate more meaningful electroencephalography (EEG) features from raw EEG features using generative adversarial networks (GAN)
Our proposed approach can be implemented without using any additional sensor information, whereas in [1] authors used additional features like acoustic or articulatory information to improve the performance of EEG based continuous speech recognition systems.
arXiv Detail & Related papers (2020-05-29T06:11:33Z) - Understanding effect of speech perception in EEG based speech
recognition systems [3.5786621294068377]
The electroencephalography (EEG) signals recorded in parallel with speech are used to perform isolated and continuous speech recognition.
We investigate whether it is possible to separate out this speech perception component from EEG signals in order to design more robust EEG based speech recognition systems.
arXiv Detail & Related papers (2020-05-29T05:56:09Z) - EEG based Continuous Speech Recognition using Transformers [13.565270550358397]
We investigate continuous speech recognition using electroencephalography (EEG) features using end-to-end transformer based automatic speech recognition (ASR) model.
Our results demonstrate that transformer based model demonstrate faster training compared to recurrent neural network (RNN) based sequence-to-sequence EEG models.
arXiv Detail & Related papers (2019-12-31T08:36:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.