Correlation based Multi-phasal models for improved imagined speech EEG
recognition
- URL: http://arxiv.org/abs/2011.02195v1
- Date: Wed, 4 Nov 2020 09:39:53 GMT
- Title: Correlation based Multi-phasal models for improved imagined speech EEG
recognition
- Authors: Rini A Sharon, Hema A Murthy
- Abstract summary: This work aims to profit from the parallel information contained in multi-phasal EEG data recorded while speaking, imagining and performing articulatory movements corresponding to specific speech units.
A bi-phase common representation learning module using neural networks is designed to model the correlation and between an analysis phase and a support phase.
The proposed approach further handles the non-availability of multi-phasal data during decoding.
- Score: 22.196642357767338
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Translation of imagined speech electroencephalogram(EEG) into human
understandable commands greatly facilitates the design of naturalistic brain
computer interfaces. To achieve improved imagined speech unit classification,
this work aims to profit from the parallel information contained in
multi-phasal EEG data recorded while speaking, imagining and performing
articulatory movements corresponding to specific speech units. A bi-phase
common representation learning module using neural networks is designed to
model the correlation and reproducibility between an analysis phase and a
support phase. The trained Correlation Network is then employed to extract
discriminative features of the analysis phase. These features are further
classified into five binary phonological categories using machine learning
models such as Gaussian mixture based hidden Markov model and deep neural
networks. The proposed approach further handles the non-availability of
multi-phasal data during decoding. Topographic visualizations along with
result-based inferences suggest that the multi-phasal correlation modelling
approach proposed in the paper enhances imagined-speech EEG recognition
performance.
Related papers
- Investigating the Timescales of Language Processing with EEG and Language Models [0.0]
This study explores the temporal dynamics of language processing by examining the alignment between word representations from a pre-trained language model and EEG data.
Using a Temporal Response Function (TRF) model, we investigate how neural activity corresponds to model representations across different layers.
Our analysis reveals patterns in TRFs from distinct layers, highlighting varying contributions to lexical and compositional processing.
arXiv Detail & Related papers (2024-06-28T12:49:27Z) - Brain-Driven Representation Learning Based on Diffusion Model [25.375490061512]
Denoising diffusion probabilistic models (DDPMs) are explored in our research as a means to address this issue.
Using DDPMs in conjunction with a conditional autoencoder, our new approach considerably outperforms traditional machine learning algorithms.
Our results highlight the potential of DDPMs as a sophisticated computational method for the analysis of speech-related EEG signals.
arXiv Detail & Related papers (2023-11-14T05:59:58Z) - Mapping EEG Signals to Visual Stimuli: A Deep Learning Approach to Match
vs. Mismatch Classification [28.186129896907694]
We propose a "match-vs-mismatch" deep learning model to classify whether a video clip induces excitatory responses in recorded EEG signals.
We demonstrate that the proposed model is able to achieve the highest accuracy on unseen subjects.
These results have the potential to facilitate the development of neural recording-based video reconstruction.
arXiv Detail & Related papers (2023-09-08T06:37:25Z) - Canonical Cortical Graph Neural Networks and its Application for Speech
Enhancement in Future Audio-Visual Hearing Aids [0.726437825413781]
This paper proposes a more biologically plausible self-supervised machine learning approach that combines multimodal information using intra-layer modulations together with canonical correlation analysis (CCA)
The approach outperformed recent state-of-the-art results considering both better clean audio reconstruction and energy efficiency, described by a reduced and smother neuron firing rate distribution.
arXiv Detail & Related papers (2022-06-06T15:20:07Z) - Self-supervised models of audio effectively explain human cortical
responses to speech [71.57870452667369]
We capitalize on the progress of self-supervised speech representation learning to create new state-of-the-art models of the human auditory system.
We show that these results show that self-supervised models effectively capture the hierarchy of information relevant to different stages of speech processing in human cortex.
arXiv Detail & Related papers (2022-05-27T22:04:02Z) - Deep Neural Convolutive Matrix Factorization for Articulatory
Representation Decomposition [48.56414496900755]
This work uses a neural implementation of convolutive sparse matrix factorization to decompose the articulatory data into interpretable gestures and gestural scores.
Phoneme recognition experiments were additionally performed to show that gestural scores indeed code phonological information successfully.
arXiv Detail & Related papers (2022-04-01T14:25:19Z) - Multimodal Emotion Recognition using Transfer Learning from Speaker
Recognition and BERT-based models [53.31917090073727]
We propose a neural network-based emotion recognition framework that uses a late fusion of transfer-learned and fine-tuned models from speech and text modalities.
We evaluate the effectiveness of our proposed multimodal approach on the interactive emotional dyadic motion capture dataset.
arXiv Detail & Related papers (2022-02-16T00:23:42Z) - Discretization and Re-synthesis: an alternative method to solve the
Cocktail Party Problem [65.25725367771075]
This study demonstrates, for the first time, that the synthesis-based approach can also perform well on this problem.
Specifically, we propose a novel speech separation/enhancement model based on the recognition of discrete symbols.
By utilizing the synthesis model with the input of discrete symbols, after the prediction of discrete symbol sequence, each target speech could be re-synthesized.
arXiv Detail & Related papers (2021-12-17T08:35:40Z) - Improved Speech Emotion Recognition using Transfer Learning and
Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction.
One of the main challenges in SER is data scarcity.
We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z) - Directed Acyclic Graph Network for Conversational Emotion Recognition [12.191046814462853]
We propose a novel idea of encoding utterances with a directed acyclic graph (DAG) to better model the intrinsic structure within a conversation.
DAG-ERC provides a more intuitive way to model the information flow between long-distance conversation background and nearby context.
Experiments are conducted on four ERC benchmarks with state-of-the-art models employed as baselines for comparison.
arXiv Detail & Related papers (2021-05-27T01:51:37Z) - Multi-modal Automated Speech Scoring using Attention Fusion [46.94442359735952]
We propose a novel multi-modal end-to-end neural approach for automated assessment of non-native English speakers' spontaneous speech.
We employ Bi-directional Recurrent Convolutional Neural Networks and Bi-directional Long Short-Term Memory Neural Networks to encode acoustic and lexical cues from spectrograms and transcriptions.
We find combined attention to both lexical and acoustic cues significantly improves the overall performance of the system.
arXiv Detail & Related papers (2020-05-17T07:53:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.