Related papers: Deep Neural Networks on EEG Signals to Predict Auditory Attention Score Using Gramian Angular Difference Field

Deep Neural Networks on EEG Signals to Predict Auditory Attention Score Using Gramian Angular Difference Field

URL: http://arxiv.org/abs/2110.12503v1
Date: Sun, 24 Oct 2021 17:58:14 GMT
Title: Deep Neural Networks on EEG Signals to Predict Auditory Attention Score Using Gramian Angular Difference Field
Authors: Mahak Kothari, Shreyansh Joshi, Adarsh Nandanwar, Aadetya Jaiswal, Veeky Baths
Abstract summary: In some sense, the auditory attention score of an individual shows the focus the person can have in auditory tasks. The recent advancements in deep learning and in the non-invasive technologies recording neural activity beg the question, can deep learning along with technologies such as electroencephalography (EEG) be used to predict the auditory attention score of an individual? In this paper, we focus on this very problem of estimating a person's auditory attention level based on their brain's electrical activity captured using 14-channeled EEG signals.
Score: 1.9899603776429056
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Auditory attention is a selective type of hearing in which people focus their attention intentionally on a specific source of a sound or spoken words whilst ignoring or inhibiting other auditory stimuli. In some sense, the auditory attention score of an individual shows the focus the person can have in auditory tasks. The recent advancements in deep learning and in the non-invasive technologies recording neural activity beg the question, can deep learning along with technologies such as electroencephalography (EEG) be used to predict the auditory attention score of an individual? In this paper, we focus on this very problem of estimating a person's auditory attention level based on their brain's electrical activity captured using 14-channeled EEG signals. More specifically, we deal with attention estimation as a regression problem. The work has been performed on the publicly available Phyaat dataset. The concept of Gramian Angular Difference Field (GADF) has been used to convert time-series EEG data into an image having 14 channels, enabling us to train various deep learning models such as 2D CNN, 3D CNN, and convolutional autoencoders. Their performances have been compared amongst themselves as well as with the work done previously. Amongst the different models we tried, 2D CNN gave the best performance. It outperformed the existing methods by a decent margin of 0.22 mean absolute error (MAE).

Related papers

AAD-LLM: Neural Attention-Driven Auditory Scene Understanding [9.596626274863832]
We present Auditory Attention-Driven LLM (AAD-LLM), a prototype system that integrates brain signals to infer listener attention. AAD-LLM predicts the attended speaker from neural activity, then conditions response generation on this inferred attentional state. We evaluate AAD-LLM on speaker description, speech transcription and extraction, and question answering in multitalker scenarios.
arXiv Detail & Related papers (2025-02-24T03:06:45Z)
Single-word Auditory Attention Decoding Using Deep Learning Model [9.698931956476692]
Identifying auditory attention by comparing auditory stimuli and corresponding brain responses, is known as auditory attention decoding (AAD) This paper presents a deep learning approach, based on EEGNet, to address this challenge.
arXiv Detail & Related papers (2024-10-15T21:57:19Z)
EGNN-C+: Interpretable Evolving Granular Neural Network and Application in Classification of Weakly-Supervised EEG Data Streams [0.0]
We introduce a modified incremental learning algorithm for evolving Granular Neural Networks (eGNN-C+) We focus on the classification of emotion-related patterns within electroencephalogram (EEG) signals.
arXiv Detail & Related papers (2024-02-26T15:11:41Z)
Do You Remember? Overcoming Catastrophic Forgetting for Fake Audio Detection [54.20974251478516]
We propose a continual learning algorithm for fake audio detection to overcome catastrophic forgetting. When fine-tuning a detection network, our approach adaptively computes the direction of weight modification according to the ratio of genuine utterances and fake utterances. Our method can easily be generalized to related fields, like speech emotion recognition.
arXiv Detail & Related papers (2023-08-07T05:05:49Z)
Corticomorphic Hybrid CNN-SNN Architecture for EEG-based Low-footprint Low-latency Auditory Attention Detection [8.549433398954738]
In a multi-speaker "cocktail party" scenario, a listener can selectively attend to a speaker of interest. Current trends in EEG-based auditory attention detection using artificial neural networks (ANN) are not practical for edge-computing platforms. We propose a hybrid convolutional neural network-spiking neural network (CNN-SNN) architecture, inspired by the auditory cortex.
arXiv Detail & Related papers (2023-07-13T20:33:39Z)
EchoVest: Real-Time Sound Classification and Depth Perception Expressed through Transcutaneous Electrical Nerve Stimulation [0.0]
We have developed a new assistive device, EchoVest, for blind/deaf people to intuitively become more aware of their environment. EchoVest transmits vibrations to the user's body by utilizing transcutaneous electric nerve stimulation (TENS) based on the source of the sounds. We aimed to outperform CNN-based machine-learning models, the most commonly used machine learning model for classification tasks, in accuracy and computational costs.
arXiv Detail & Related papers (2023-07-10T14:43:32Z)
Jointly Learning Visual and Auditory Speech Representations from Raw Data [108.68531445641769]
RAVEn is a self-supervised multi-modal approach to jointly learn visual and auditory speech representations. Our design is asymmetric w.r.t. driven by the inherent differences between video and audio. RAVEn surpasses all self-supervised methods on visual speech recognition.
arXiv Detail & Related papers (2022-12-12T21:04:06Z)
Toward a realistic model of speech processing in the brain with self-supervised learning [67.7130239674153]
Self-supervised algorithms trained on the raw waveform constitute a promising candidate. We show that Wav2Vec 2.0 learns brain-like representations with as little as 600 hours of unlabelled speech.
arXiv Detail & Related papers (2022-06-03T17:01:46Z)
Audio-visual Representation Learning for Anomaly Events Detection in Crowds [119.72951028190586]
This paper attempts to exploit multi-modal learning for modeling the audio and visual signals simultaneously. We conduct the experiments on SHADE dataset, a synthetic audio-visual dataset in surveillance scenes. We find introducing audio signals effectively improves the performance of anomaly events detection and outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2021-10-28T02:42:48Z)
Noisy Agents: Self-supervised Exploration by Predicting Auditory Events [127.82594819117753]
We propose a novel type of intrinsic motivation for Reinforcement Learning (RL) that encourages the agent to understand the causal effect of its actions. We train a neural network to predict the auditory events and use the prediction errors as intrinsic rewards to guide RL exploration. Experimental results on Atari games show that our new intrinsic motivation significantly outperforms several state-of-the-art baselines.
arXiv Detail & Related papers (2020-07-27T17:59:08Z)
Human brain activity for machine attention [8.673635963837532]
We are the first to exploit neuroscientific data, namely electroencephalography (EEG), to inform a neural attention model about language processing of the human brain. We devise a method for finding such EEG features to supervise machine attention through combining theoretically motivated cropping with random forest tree splits. We apply these features to regularise attention on relation classification and show that EEG is more informative than strong baselines.
arXiv Detail & Related papers (2020-06-09T08:39:07Z)
An End-to-End Visual-Audio Attention Network for Emotion Recognition in User-Generated Videos [64.91614454412257]
We propose to recognize video emotions in an end-to-end manner based on convolutional neural networks (CNNs) Specifically, we develop a deep Visual-Audio Attention Network (VAANet), a novel architecture that integrates spatial, channel-wise, and temporal attentions into a visual 3D CNN and temporal attentions into an audio 2D CNN.
arXiv Detail & Related papers (2020-02-12T15:33:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.