EEG2Mel: Reconstructing Sound from Brain Responses to Music
- URL: http://arxiv.org/abs/2207.13845v1
- Date: Thu, 28 Jul 2022 01:06:51 GMT
- Title: EEG2Mel: Reconstructing Sound from Brain Responses to Music
- Authors: Adolfo G. Ramirez-Aristizabal, Chris Kello
- Abstract summary: We improve on previous methods by reconstructing music stimuli well enough to be perceived and identified independently.
Deep learning models were trained on time-aligned music stimuli spectrum for each corresponding one-second window of EEG recording.
Reconstructions of auditory music stimuli were discriminated by listeners at an 85% success rate (50% chance) in a two-alternative match-to-sample task.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Information retrieval from brain responses to auditory and visual stimuli has
shown success through classification of song names and image classes presented
to participants while recording EEG signals. Information retrieval in the form
of reconstructing auditory stimuli has also shown some success, but here we
improve on previous methods by reconstructing music stimuli well enough to be
perceived and identified independently. Furthermore, deep learning models were
trained on time-aligned music stimuli spectrum for each corresponding
one-second window of EEG recording, which greatly reduces feature extraction
steps needed when compared to prior studies. The NMED-Tempo and NMED-Hindi
datasets of participants passively listening to full length songs were used to
train and validate Convolutional Neural Network (CNN) regressors. The efficacy
of raw voltage versus power spectrum inputs and linear versus mel spectrogram
outputs were tested, and all inputs and outputs were converted into 2D images.
The quality of reconstructed spectrograms was assessed by training classifiers
which showed 81% accuracy for mel-spectrograms and 72% for linear spectrograms
(10% chance accuracy). Lastly, reconstructions of auditory music stimuli were
discriminated by listeners at an 85% success rate (50% chance) in a
two-alternative match-to-sample task.
Related papers
- Recurrent and Convolutional Neural Networks in Classification of EEG Signal for Guided Imagery and Mental Workload Detection [0.9895793818721335]
This paper presents the results of the investigations of a cohort of 26 students exposed to Guided Imagery relaxation technique and mental task workloads conducted with the use of dense array electroencephalographic amplifier.
arXiv Detail & Related papers (2024-05-27T07:49:30Z) - Anomalous Sound Detection using Audio Representation with Machine ID
based Contrastive Learning Pretraining [52.191658157204856]
This paper uses contrastive learning to refine audio representations for each machine ID, rather than for each audio sample.
The proposed two-stage method uses contrastive learning to pretrain the audio representation model.
Experiments show that our method outperforms the state-of-the-art methods using contrastive learning or self-supervised classification.
arXiv Detail & Related papers (2023-04-07T11:08:31Z) - Jointly Learning Visual and Auditory Speech Representations from Raw
Data [108.68531445641769]
RAVEn is a self-supervised multi-modal approach to jointly learn visual and auditory speech representations.
Our design is asymmetric w.r.t. driven by the inherent differences between video and audio.
RAVEn surpasses all self-supervised methods on visual speech recognition.
arXiv Detail & Related papers (2022-12-12T21:04:06Z) - Audio-visual multi-channel speech separation, dereverberation and
recognition [70.34433820322323]
This paper proposes an audio-visual multi-channel speech separation, dereverberation and recognition approach.
The advantage of the additional visual modality over using audio only is demonstrated on two neural dereverberation approaches.
Experiments conducted on the LRS2 dataset suggest that the proposed audio-visual multi-channel speech separation, dereverberation and recognition system outperforms the baseline.
arXiv Detail & Related papers (2022-04-05T04:16:03Z) - Enhancing Affective Representations of Music-Induced EEG through
Multimodal Supervision and latent Domain Adaptation [34.726185927120355]
We employ music signals as a supervisory modality to EEG, aiming to project their semantic correspondence onto a common representation space.
We utilize a bi-modal framework by combining an LSTM-based attention model to process EEG and a pre-trained model for music tagging, along with a reverse domain discriminator to align the distributions of the two modalities.
The resulting framework can be utilized for emotion recognition both directly, by performing supervised predictions from either modality, and indirectly, by providing relevant music samples to EEG input queries.
arXiv Detail & Related papers (2022-02-20T07:32:12Z) - EEG-based Classification of Drivers Attention using Convolutional Neural
Network [0.0]
This study compares the performance of several attention classifiers trained on participants brain activity.
CNN model trained on raw EEG data obtained under kinesthetic feedback achieved the highest accuracy 89%.
Our findings show that CNN and raw EEG signals can be employed for effective training of a passive BCI for real-time attention classification.
arXiv Detail & Related papers (2021-08-23T10:55:52Z) - Improved Speech Emotion Recognition using Transfer Learning and
Spectrogram Augmentation [56.264157127549446]
Speech emotion recognition (SER) is a challenging task that plays a crucial role in natural human-computer interaction.
One of the main challenges in SER is data scarcity.
We propose a transfer learning strategy combined with spectrogram augmentation.
arXiv Detail & Related papers (2021-08-05T10:39:39Z) - A Novel mapping for visual to auditory sensory substitution [0.0]
visual information can be converted into audio stream via sensory substitution devices.
Results in blind object recognition for real objects was achieved 88.05 on average.
arXiv Detail & Related papers (2021-06-14T14:14:50Z) - Audiovisual transfer learning for audio tagging and sound event
detection [21.574781022415372]
We study the merit of transfer learning for two sound recognition problems, i.e., audio tagging and sound event detection.
We adapt a baseline system utilizing only spectral acoustic inputs to make use of pretrained auditory and visual features.
We perform experiments with these modified models on an audiovisual multi-label data set.
arXiv Detail & Related papers (2021-06-09T21:55:05Z) - Fast accuracy estimation of deep learning based multi-class musical
source separation [79.10962538141445]
We propose a method to evaluate the separability of instruments in any dataset without training and tuning a neural network.
Based on the oracle principle with an ideal ratio mask, our approach is an excellent proxy to estimate the separation performances of state-of-the-art deep learning approaches.
arXiv Detail & Related papers (2020-10-19T13:05:08Z) - Multi-Modal Music Information Retrieval: Augmenting Audio-Analysis with
Visual Computing for Improved Music Video Analysis [91.3755431537592]
This thesis combines audio-analysis with computer vision to approach Music Information Retrieval (MIR) tasks from a multi-modal perspective.
The main hypothesis of this work is based on the observation that certain expressive categories such as genre or theme can be recognized on the basis of the visual content alone.
The experiments are conducted for three MIR tasks Artist Identification, Music Genre Classification and Cross-Genre Classification.
arXiv Detail & Related papers (2020-02-01T17:57:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.