Brain-Driven Representation Learning Based on Diffusion Model
- URL: http://arxiv.org/abs/2311.07925v1
- Date: Tue, 14 Nov 2023 05:59:58 GMT
- Title: Brain-Driven Representation Learning Based on Diffusion Model
- Authors: Soowon Kim, Seo-Hyun Lee, Young-Eun Lee, Ji-Won Lee, Ji-Ha Park,
Seong-Whan Lee
- Abstract summary: Denoising diffusion probabilistic models (DDPMs) are explored in our research as a means to address this issue.
Using DDPMs in conjunction with a conditional autoencoder, our new approach considerably outperforms traditional machine learning algorithms.
Our results highlight the potential of DDPMs as a sophisticated computational method for the analysis of speech-related EEG signals.
- Score: 25.375490061512
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Interpreting EEG signals linked to spoken language presents a complex
challenge, given the data's intricate temporal and spatial attributes, as well
as the various noise factors. Denoising diffusion probabilistic models (DDPMs),
which have recently gained prominence in diverse areas for their capabilities
in representation learning, are explored in our research as a means to address
this issue. Using DDPMs in conjunction with a conditional autoencoder, our new
approach considerably outperforms traditional machine learning algorithms and
established baseline models in accuracy. Our results highlight the potential of
DDPMs as a sophisticated computational method for the analysis of
speech-related EEG signals. This could lead to significant advances in
brain-computer interfaces tailored for spoken communication.
Related papers
- Towards Linguistic Neural Representation Learning and Sentence Retrieval from Electroencephalogram Recordings [27.418738450536047]
We propose a two-step pipeline for converting EEG signals into sentences.
We first confirm that word-level semantic information can be learned from EEG data recorded during natural reading.
We employ a training-free retrieval method to retrieve sentences based on the predictions from the EEG encoder.
arXiv Detail & Related papers (2024-08-08T03:40:25Z) - Du-IN: Discrete units-guided mask modeling for decoding speech from Intracranial Neural signals [5.283718601431859]
Invasive brain-computer interfaces with Electrocorticography (ECoG) have shown promise for high-performance speech decoding in medical applications.
We developed the Du-IN model, which extracts contextual embeddings based on region-level tokens through discrete codex-guided mask modeling.
Our model achieves state-of-the-art performance on the 61-word classification task, surpassing all baselines.
arXiv Detail & Related papers (2024-05-19T06:00:36Z) - EEG decoding with conditional identification information [7.873458431535408]
Decoding EEG signals is crucial for unraveling human brain and advancing brain-computer interfaces.
Traditional machine learning algorithms have been hindered by the high noise levels and inherent inter-person variations in EEG signals.
Recent advances in deep neural networks (DNNs) have shown promise, owing to their advanced nonlinear modeling capabilities.
arXiv Detail & Related papers (2024-03-21T13:38:59Z) - Diffusion Model as Representation Learner [86.09969334071478]
Diffusion Probabilistic Models (DPMs) have recently demonstrated impressive results on various generative tasks.
We propose a novel knowledge transfer method that leverages the knowledge acquired by DPMs for recognition tasks.
arXiv Detail & Related papers (2023-08-21T00:38:39Z) - Diff-E: Diffusion-based Learning for Decoding Imagined Speech EEG [17.96977778655143]
We propose a novel method for decoding EEG signals for imagined speech using DDPMs and a conditional autoencoder named Diff-E.
Results indicate that Diff-E significantly improves the accuracy of decoding EEG signals for imagined speech compared to traditional machine learning techniques and baseline models.
arXiv Detail & Related papers (2023-07-26T07:12:39Z) - Dynamic Latent Separation for Deep Learning [67.62190501599176]
A core problem in machine learning is to learn expressive latent variables for model prediction on complex data.
Here, we develop an approach that improves expressiveness, provides partial interpretation, and is not restricted to specific applications.
arXiv Detail & Related papers (2022-10-07T17:56:53Z) - End-to-End Active Speaker Detection [58.7097258722291]
We propose an end-to-end training network where feature learning and contextual predictions are jointly learned.
We also introduce intertemporal graph neural network (iGNN) blocks, which split the message passing according to the main sources of context in the ASD problem.
Experiments show that the aggregated features from the iGNN blocks are more suitable for ASD, resulting in state-of-the art performance.
arXiv Detail & Related papers (2022-03-27T08:55:28Z) - EEGminer: Discovering Interpretable Features of Brain Activity with
Learnable Filters [72.19032452642728]
We propose a novel differentiable EEG decoding pipeline consisting of learnable filters and a pre-determined feature extraction module.
We demonstrate the utility of our model towards emotion recognition from EEG signals on the SEED dataset and on a new EEG dataset of unprecedented size.
The discovered features align with previous neuroscience studies and offer new insights, such as marked differences in the functional connectivity profile between left and right temporal areas during music listening.
arXiv Detail & Related papers (2021-10-19T14:22:04Z) - Improved Speech Emotion Recognition using Transfer Learning and
Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction.
One of the main challenges in SER is data scarcity.
We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z) - Correlation based Multi-phasal models for improved imagined speech EEG
recognition [22.196642357767338]
This work aims to profit from the parallel information contained in multi-phasal EEG data recorded while speaking, imagining and performing articulatory movements corresponding to specific speech units.
A bi-phase common representation learning module using neural networks is designed to model the correlation and between an analysis phase and a support phase.
The proposed approach further handles the non-availability of multi-phasal data during decoding.
arXiv Detail & Related papers (2020-11-04T09:39:53Z) - Data Augmentation for Spoken Language Understanding via Pretrained
Language Models [113.56329266325902]
Training of spoken language understanding (SLU) models often faces the problem of data scarcity.
We put forward a data augmentation method using pretrained language models to boost the variability and accuracy of generated utterances.
arXiv Detail & Related papers (2020-04-29T04:07:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.