DBNET: DOA-driven beamforming network for end-to-end farfield sound
source separation
- URL: http://arxiv.org/abs/2010.11566v1
- Date: Thu, 22 Oct 2020 09:52:05 GMT
- Title: DBNET: DOA-driven beamforming network for end-to-end farfield sound
source separation
- Authors: Ali Aroudi and Sebastian Braun
- Abstract summary: We propose a direction-of-arrival-driven beamforming network (DBnet) for end-to-end source separation.
We also propose end-to-end extensions of DBnet which incorporate post masking networks.
The experimental results show that the proposed extended DBnet using a convolutional-recurrent post masking network outperforms state-of-the-art source separation methods.
- Score: 20.200763595732912
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many deep learning techniques are available to perform source separation and
reduce background noise. However, designing an end-to-end multi-channel source
separation method using deep learning and conventional acoustic signal
processing techniques still remains challenging. In this paper we propose a
direction-of-arrival-driven beamforming network (DBnet) consisting of
direction-of-arrival (DOA) estimation and beamforming layers for end-to-end
source separation. We propose to train DBnet using loss functions that are
solely based on the distances between the separated speech signals and the
target speech signals, without a need for the ground-truth DOAs of speakers. To
improve the source separation performance, we also propose end-to-end
extensions of DBnet which incorporate post masking networks. We evaluate the
proposed DBnet and its extensions on a very challenging dataset, targeting
realistic far-field sound source separation in reverberant and noisy
environments. The experimental results show that the proposed extended DBnet
using a convolutional-recurrent post masking network outperforms
state-of-the-art source separation methods.
Related papers
- A unified multichannel far-field speech recognition system: combining
neural beamforming with attention based end-to-end model [14.795953417531907]
We propose a unified multichannel far-field speech recognition system that combines the neural beamforming and transformer-based Listen, Spell, Attend (LAS) speech recognition system.
The proposed method achieve 19.26% improvement when compared with a strong baseline.
arXiv Detail & Related papers (2024-01-05T07:11:13Z) - On Neural Architectures for Deep Learning-based Source Separation of
Co-Channel OFDM Signals [104.11663769306566]
We study the single-channel source separation problem involving frequency-division multiplexing (OFDM) signals.
We propose critical domain-informed modifications to the network parameterization, based on insights from OFDM structures.
arXiv Detail & Related papers (2023-03-11T16:29:13Z) - MIMO-DBnet: Multi-channel Input and Multiple Outputs DOA-aware
Beamforming Network for Speech Separation [55.533789120204055]
We propose an end-to-end beamforming network for direction guided speech separation given merely the mixture signal.
Specifically, we design a multi-channel input and multiple outputs architecture to predict the direction-of-arrival based embeddings and beamforming weights for each source.
arXiv Detail & Related papers (2022-12-07T01:52:40Z) - Data-Driven Blind Synchronization and Interference Rejection for Digital
Communication Signals [98.95383921866096]
We study the potential of data-driven deep learning methods for separation of two communication signals from an observation of their mixture.
We show that capturing high-resolution temporal structures (nonstationarities) leads to substantial performance gains.
We propose a domain-informed neural network (NN) design that is able to improve upon both "off-the-shelf" NNs and classical detection and interference rejection methods.
arXiv Detail & Related papers (2022-09-11T14:10:37Z) - Implicit Neural Spatial Filtering for Multichannel Source Separation in
the Waveform Domain [131.74762114632404]
The model is trained end-to-end and performs spatial processing implicitly.
We evaluate the proposed model on a real-world dataset and show that the model matches the performance of an oracle beamformer.
arXiv Detail & Related papers (2022-06-30T17:13:01Z) - End-to-End Active Speaker Detection [58.7097258722291]
We propose an end-to-end training network where feature learning and contextual predictions are jointly learned.
We also introduce intertemporal graph neural network (iGNN) blocks, which split the message passing according to the main sources of context in the ASD problem.
Experiments show that the aggregated features from the iGNN blocks are more suitable for ASD, resulting in state-of-the art performance.
arXiv Detail & Related papers (2022-03-27T08:55:28Z) - Unsupervised Audio Source Separation using Generative Priors [43.35195236159189]
We propose a novel approach for audio source separation based on generative priors trained on individual sources.
Our approach simultaneously searches in the source-specific latent spaces to effectively recover the constituent sources.
arXiv Detail & Related papers (2020-05-28T03:57:16Z) - Spatial and spectral deep attention fusion for multi-channel speech
separation using deep embedding features [60.20150317299749]
Multi-channel deep clustering (MDC) has acquired a good performance for speech separation.
We propose a deep attention fusion method to dynamically control the weights of the spectral and spatial features and combine them deeply.
Experimental results show that the proposed method outperforms MDC baseline and even better than the ideal binary mask (IBM)
arXiv Detail & Related papers (2020-02-05T03:49:39Z) - Time-Domain Audio Source Separation Based on Wave-U-Net Combined with
Discrete Wavelet Transform [34.05660769694652]
We propose a time-domain audio source separation method based on a discrete wavelet transform (DWT)
The proposed method is based on one of the state-of-the-art deep neural networks, Wave-U-Net.
arXiv Detail & Related papers (2020-01-28T06:43:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.