Canonical Cortical Graph Neural Networks and its Application for Speech
Enhancement in Future Audio-Visual Hearing Aids
- URL: http://arxiv.org/abs/2206.02671v1
- Date: Mon, 6 Jun 2022 15:20:07 GMT
- Title: Canonical Cortical Graph Neural Networks and its Application for Speech
Enhancement in Future Audio-Visual Hearing Aids
- Authors: Leandro A. Passos, Jo\~ao Paulo Papa, Ahsan Adeel
- Abstract summary: This paper proposes a more biologically plausible self-supervised machine learning approach that combines multimodal information using intra-layer modulations together with canonical correlation analysis (CCA)
The approach outperformed recent state-of-the-art results considering both better clean audio reconstruction and energy efficiency, described by a reduced and smother neuron firing rate distribution.
- Score: 0.726437825413781
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite the recent success of machine learning algorithms, most of these
models still face several drawbacks when considering more complex tasks
requiring interaction between different sources, such as multimodal input data
and logical time sequence. On the other hand, the biological brain is highly
sharpened in this sense, empowered to automatically manage and integrate such a
stream of information through millions of years of evolution. In this context,
this paper finds inspiration from recent discoveries on cortical circuits in
the brain to propose a more biologically plausible self-supervised machine
learning approach that combines multimodal information using intra-layer
modulations together with canonical correlation analysis (CCA), as well as a
memory mechanism to keep track of temporal data, the so-called Canonical
Cortical Graph Neural networks. The approach outperformed recent
state-of-the-art results considering both better clean audio reconstruction and
energy efficiency, described by a reduced and smother neuron firing rate
distribution, suggesting the model as a suitable approach for speech
enhancement in future audio-visual hearing aid devices.
Related papers
- Meta-Dynamical State Space Models for Integrative Neural Data Analysis [8.625491800829224]
Learning shared structure across environments facilitates rapid learning and adaptive behavior in neural systems.
There has been limited work exploiting the shared structure in neural activity during similar tasks for learning latent dynamics from neural recordings.
We propose a novel approach for meta-learning this solution space from task-related neural activity of trained animals.
arXiv Detail & Related papers (2024-10-07T19:35:49Z) - Understanding Auditory Evoked Brain Signal via Physics-informed Embedding Network with Multi-Task Transformer [3.261870217889503]
We propose an innovative multi-task learning model, Physics-informed Embedding Network with Multi-Task Transformer (PEMT-Net)
PEMT-Net enhances decoding performance through physics-informed embedding and deep learning techniques.
Experiments on a specific dataset demonstrate PEMT-Net's significant performance in multi-task auditory signal decoding.
arXiv Detail & Related papers (2024-06-04T06:53:32Z) - DSAM: A Deep Learning Framework for Analyzing Temporal and Spatial Dynamics in Brain Networks [4.041732967881764]
Most rs-fMRI studies compute a single static functional connectivity matrix across brain regions of interest.
These approaches are at risk of oversimplifying brain dynamics and lack proper consideration of the goal at hand.
We propose a novel interpretable deep learning framework that learns goal-specific functional connectivity matrix directly from time series.
arXiv Detail & Related papers (2024-05-19T23:35:06Z) - Exploring neural oscillations during speech perception via surrogate gradient spiking neural networks [59.38765771221084]
We present a physiologically inspired speech recognition architecture compatible and scalable with deep learning frameworks.
We show end-to-end gradient descent training leads to the emergence of neural oscillations in the central spiking neural network.
Our findings highlight the crucial inhibitory role of feedback mechanisms, such as spike frequency adaptation and recurrent connections, in regulating and synchronising neural activity to improve recognition performance.
arXiv Detail & Related papers (2024-04-22T09:40:07Z) - The Expressive Leaky Memory Neuron: an Efficient and Expressive Phenomenological Neuron Model Can Solve Long-Horizon Tasks [64.08042492426992]
We introduce the Expressive Memory (ELM) neuron model, a biologically inspired model of a cortical neuron.
Our ELM neuron can accurately match the aforementioned input-output relationship with under ten thousand trainable parameters.
We evaluate it on various tasks with demanding temporal structures, including the Long Range Arena (LRA) datasets.
arXiv Detail & Related papers (2023-06-14T13:34:13Z) - Single-Layer Vision Transformers for More Accurate Early Exits with Less
Overhead [88.17413955380262]
We introduce a novel architecture for early exiting based on the vision transformer architecture.
We show that our method works for both classification and regression problems.
We also introduce a novel method for integrating audio and visual modalities within early exits in audiovisual data analysis.
arXiv Detail & Related papers (2021-05-19T13:30:34Z) - Correlation based Multi-phasal models for improved imagined speech EEG
recognition [22.196642357767338]
This work aims to profit from the parallel information contained in multi-phasal EEG data recorded while speaking, imagining and performing articulatory movements corresponding to specific speech units.
A bi-phase common representation learning module using neural networks is designed to model the correlation and between an analysis phase and a support phase.
The proposed approach further handles the non-availability of multi-phasal data during decoding.
arXiv Detail & Related papers (2020-11-04T09:39:53Z) - Reservoir Memory Machines as Neural Computers [70.5993855765376]
Differentiable neural computers extend artificial neural networks with an explicit memory without interference.
We achieve some of the computational capabilities of differentiable neural computers with a model that can be trained very efficiently.
arXiv Detail & Related papers (2020-09-14T12:01:30Z) - Incremental Training of a Recurrent Neural Network Exploiting a
Multi-Scale Dynamic Memory [79.42778415729475]
We propose a novel incrementally trained recurrent architecture targeting explicitly multi-scale learning.
We show how to extend the architecture of a simple RNN by separating its hidden state into different modules.
We discuss a training algorithm where new modules are iteratively added to the model to learn progressively longer dependencies.
arXiv Detail & Related papers (2020-06-29T08:35:49Z) - Multi-modal Automated Speech Scoring using Attention Fusion [46.94442359735952]
We propose a novel multi-modal end-to-end neural approach for automated assessment of non-native English speakers' spontaneous speech.
We employ Bi-directional Recurrent Convolutional Neural Networks and Bi-directional Long Short-Term Memory Neural Networks to encode acoustic and lexical cues from spectrograms and transcriptions.
We find combined attention to both lexical and acoustic cues significantly improves the overall performance of the system.
arXiv Detail & Related papers (2020-05-17T07:53:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.