DNN-Based Distributed Multichannel Mask Estimation for Speech
Enhancement in Microphone Arrays
- URL: http://arxiv.org/abs/2002.06016v2
- Date: Mon, 16 Mar 2020 15:58:55 GMT
- Title: DNN-Based Distributed Multichannel Mask Estimation for Speech
Enhancement in Microphone Arrays
- Authors: Nicolas Furnon (LORIA, MULTISPEECH), Romain Serizel (LORIA,
MULTISPEECH), Irina Illina (LORIA, MULTISPEECH), Slim Essid (LTCI)
- Abstract summary: We propose to extend the distributed adaptive node-specific signal estimation approach to a neural networks framework.
In an array of two nodes, we show that this additional signal can be efficiently taken into account to predict the masks and leads to better speech enhancement performances.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multichannel processing is widely used for speech enhancement but several
limitations appear when trying to deploy these solutions to the real-world.
Distributed sensor arrays that consider several devices with a few microphones
is a viable alternative that allows for exploiting the multiple devices
equipped with microphones that we are using in our everyday life. In this
context, we propose to extend the distributed adaptive node-specific signal
estimation approach to a neural networks framework. At each node, a local
filtering is performed to send one signal to the other nodes where a mask is
estimated by a neural network in order to compute a global multi-channel Wiener
filter. In an array of two nodes, we show that this additional signal can be
efficiently taken into account to predict the masks and leads to better speech
enhancement performances than when the mask estimation relies only on the local
signals.
Related papers
- Joint Channel Estimation and Feedback with Masked Token Transformers in
Massive MIMO Systems [74.52117784544758]
This paper proposes an encoder-decoder based network that unveils the intrinsic frequency-domain correlation within the CSI matrix.
The entire encoder-decoder network is utilized for channel compression.
Our method outperforms state-of-the-art channel estimation and feedback techniques in joint tasks.
arXiv Detail & Related papers (2023-06-08T06:15:17Z) - MIMO-DBnet: Multi-channel Input and Multiple Outputs DOA-aware
Beamforming Network for Speech Separation [55.533789120204055]
We propose an end-to-end beamforming network for direction guided speech separation given merely the mixture signal.
Specifically, we design a multi-channel input and multiple outputs architecture to predict the direction-of-arrival based embeddings and beamforming weights for each source.
arXiv Detail & Related papers (2022-12-07T01:52:40Z) - Bandwidth-efficient distributed neural network architectures with
application to body sensor networks [73.02174868813475]
This paper describes a conceptual design methodology to design distributed neural network architectures.
We show that the proposed framework enables up to a factor 20 in bandwidth reduction with minimal loss.
While the application focus of this paper is on wearable brain-computer interfaces, the proposed methodology can be applied in other sensor network-like applications as well.
arXiv Detail & Related papers (2022-10-14T12:35:32Z) - MFA: TDNN with Multi-scale Frequency-channel Attention for
Text-independent Speaker Verification with Short Utterances [94.70787497137854]
We propose a multi-scale frequency-channel attention (MFA) to characterize speakers at different scales through a novel dual-path design which consists of a convolutional neural network and TDNN.
We evaluate the proposed MFA on the VoxCeleb database and observe that the proposed framework with MFA can achieve state-of-the-art performance while reducing parameters and complexity.
arXiv Detail & Related papers (2022-02-03T14:57:05Z) - Multi-Channel End-to-End Neural Diarization with Distributed Microphones [53.99406868339701]
We replace Transformer encoders in EEND with two types of encoders that process a multi-channel input.
We also propose a model adaptation method using only single-channel recordings.
arXiv Detail & Related papers (2021-10-10T03:24:03Z) - Enhancement of Spatial Clustering-Based Time-Frequency Masks using LSTM
Neural Networks [3.730592618611028]
We use LSTMs to enhance spatial clustering based time-frequency masks.
We achieve both the signal modeling performance of multiple single-channel LSTM-DNN speech enhancers and the signal separation performance.
We evaluate the intelligibility of the output of each system using word error rate from a Kaldi automatic speech recognizer.
arXiv Detail & Related papers (2020-12-02T22:29:29Z) - Resource-Efficient Speech Mask Estimation for Multi-Channel Speech
Enhancement [15.361841669377776]
We provide a resource-efficient approach for multi-channel speech enhancement based on Deep Neural Networks (DNNs)
In particular, we use reduced-precision DNNs for estimating a speech mask from noisy, multi-channel microphone observations.
In the extreme case of binary weights and reduced precision activations, a significant reduction of execution time and memory footprint is possible.
arXiv Detail & Related papers (2020-07-22T14:58:29Z) - Multi-Tones' Phase Coding (MTPC) of Interaural Time Difference by
Spiking Neural Network [68.43026108936029]
We propose a pure spiking neural network (SNN) based computational model for precise sound localization in the noisy real-world environment.
We implement this algorithm in a real-time robotic system with a microphone array.
The experiment results show a mean error azimuth of 13 degrees, which surpasses the accuracy of the other biologically plausible neuromorphic approach for sound source localization.
arXiv Detail & Related papers (2020-07-07T08:22:56Z) - Neural Speech Separation Using Spatially Distributed Microphones [19.242927805448154]
This paper proposes a neural network based speech separation method using spatially distributed microphones.
Unlike with traditional microphone array settings, neither the number of microphones nor their spatial arrangement is known in advance.
Speech recognition experimental results show that the proposed method significantly outperforms baseline multi-channel speech separation systems.
arXiv Detail & Related papers (2020-04-28T17:16:31Z) - Channel-Attention Dense U-Net for Multichannel Speech Enhancement [21.94418736688929]
We introduce a channel-attention mechanism inside the deep architecture to mimic beamforming.
We demonstrate the superior performance of the network against the state-of-the-art approaches on the CHiME-3 dataset.
arXiv Detail & Related papers (2020-01-30T19:56:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.