Multimodal Speech Enhancement Using Burst Propagation
- URL: http://arxiv.org/abs/2209.03275v2
- Date: Mon, 5 Feb 2024 17:54:04 GMT
- Title: Multimodal Speech Enhancement Using Burst Propagation
- Authors: Mohsin Raza, Leandro A. Passos, Ahmed Khubaib, Ahsan Adeel
- Abstract summary: This paper proposes the MBURST, a novel multimodal solution for audio-visual speech enhancements.
It implements several criteria to address the credit assignment problem in a more biologically plausible manner.
Experiments conducted over a Grid Corpus and CHiME3-based dataset show that MBURST can reproduce similar mask reconstructions to the multimodal backpropagation-based baseline.
- Score: 2.03742455046876
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper proposes the MBURST, a novel multimodal solution for audio-visual
speech enhancements that consider the most recent neurological discoveries
regarding pyramidal cells of the prefrontal cortex and other brain regions. The
so-called burst propagation implements several criteria to address the credit
assignment problem in a more biologically plausible manner: steering the sign
and magnitude of plasticity through feedback, multiplexing the feedback and
feedforward information across layers through different weight connections,
approximating feedback and feedforward connections, and linearizing the
feedback signals. MBURST benefits from such capabilities to learn correlations
between the noisy signal and the visual stimuli, thus attributing meaning to
the speech by amplifying relevant information and suppressing noise.
Experiments conducted over a Grid Corpus and CHiME3-based dataset show that
MBURST can reproduce similar mask reconstructions to the multimodal
backpropagation-based baseline while demonstrating outstanding energy
efficiency management, reducing the neuron firing rates to values up to
\textbf{$70\%$} lower. Such a feature implies more sustainable implementations,
suitable and desirable for hearing aids or any other similar embedded systems.
Related papers
- Reverse the auditory processing pathway: Coarse-to-fine audio reconstruction from fMRI [20.432212333539628]
We introduce a novel coarse-to-fine audio reconstruction method based on functional Magnetic Resonance Imaging (fMRI) data.
We validate our method on three public fMRI datasets-Brain2Sound, Brain2Music, and Brain2Speech.
By employing semantic prompts during decoding, we enhance the quality of reconstructed audio when semantic features are suboptimal.
arXiv Detail & Related papers (2024-05-29T03:16:14Z) - Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement
Learning [53.00683059396803]
Mask image model (MIM) has been widely used due to its simplicity and effectiveness in recovering original information from masked images.
We propose a decision-based MIM that utilizes reinforcement learning (RL) to automatically search for optimal image masking ratio and masking strategy.
Our approach has a significant advantage over alternative self-supervised methods on the task of neuron segmentation.
arXiv Detail & Related papers (2023-10-06T10:40:46Z) - High-Fidelity Speech Synthesis with Minimal Supervision: All Using
Diffusion Models [56.00939852727501]
Minimally-supervised speech synthesis decouples TTS by combining two types of discrete speech representations.
Non-autoregressive framework enhances controllability, and duration diffusion model enables diversified prosodic expression.
arXiv Detail & Related papers (2023-09-27T09:27:03Z) - Speech enhancement with frequency domain auto-regressive modeling [34.55703785405481]
Speech applications in far-field real world settings often deal with signals that are corrupted by reverberation.
We propose a unified framework of speech dereverberation for improving the speech quality and the automatic speech recognition (ASR) performance.
arXiv Detail & Related papers (2023-09-24T03:25:51Z) - Improving the Robustness of Summarization Systems with Dual Augmentation [68.53139002203118]
A robust summarization system should be able to capture the gist of the document, regardless of the specific word choices or noise in the input.
We first explore the summarization models' robustness against perturbations including word-level synonym substitution and noise.
We propose a SummAttacker, which is an efficient approach to generating adversarial samples based on language models.
arXiv Detail & Related papers (2023-06-01T19:04:17Z) - Brain Imaging-to-Graph Generation using Adversarial Hierarchical Diffusion Models for MCI Causality Analysis [44.45598796591008]
Brain imaging-to-graph generation (BIGG) framework is proposed to map functional magnetic resonance imaging (fMRI) into effective connectivity for mild cognitive impairment analysis.
The hierarchical transformers in the generator are designed to estimate the noise at multiple scales.
Evaluations of the ADNI dataset demonstrate the feasibility and efficacy of the proposed model.
arXiv Detail & Related papers (2023-05-18T06:54:56Z) - On Neural Architectures for Deep Learning-based Source Separation of
Co-Channel OFDM Signals [104.11663769306566]
We study the single-channel source separation problem involving frequency-division multiplexing (OFDM) signals.
We propose critical domain-informed modifications to the network parameterization, based on insights from OFDM structures.
arXiv Detail & Related papers (2023-03-11T16:29:13Z) - Canonical Cortical Graph Neural Networks and its Application for Speech
Enhancement in Future Audio-Visual Hearing Aids [0.726437825413781]
This paper proposes a more biologically plausible self-supervised machine learning approach that combines multimodal information using intra-layer modulations together with canonical correlation analysis (CCA)
The approach outperformed recent state-of-the-art results considering both better clean audio reconstruction and energy efficiency, described by a reduced and smother neuron firing rate distribution.
arXiv Detail & Related papers (2022-06-06T15:20:07Z) - A Computational Framework of Cortical Microcircuits Approximates
Sign-concordant Random Backpropagation [7.601127912271984]
We propose a hypothetical framework consisting of a new microcircuit architecture and its supporting Hebbian learning rules.
We employ the Hebbian rule operating in local compartments to update synaptic weights and achieve supervised learning in a biologically plausible manner.
The proposed framework is benchmarked on several datasets including MNIST and CIFAR10, demonstrating promising BP-comparable accuracy.
arXiv Detail & Related papers (2022-05-15T14:22:03Z) - Mutual Information Maximization for Effective Lip Reading [99.11600901751673]
We propose to introduce the mutual information constraints on both the local feature's level and the global sequence's level.
By combining these two advantages together, the proposed method is expected to be both discriminative and robust for effective lip reading.
arXiv Detail & Related papers (2020-03-13T18:47:42Z) - ADRN: Attention-based Deep Residual Network for Hyperspectral Image
Denoising [52.01041506447195]
We propose an attention-based deep residual network to learn a mapping from noisy HSI to the clean one.
Experimental results demonstrate that our proposed ADRN scheme outperforms the state-of-the-art methods both in quantitative and visual evaluations.
arXiv Detail & Related papers (2020-03-04T08:36:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.