MIMO-DoAnet: Multi-channel Input and Multiple Outputs DoA Network with
Unknown Number of Sound Sources
- URL: http://arxiv.org/abs/2207.07307v1
- Date: Fri, 15 Jul 2022 06:18:00 GMT
- Title: MIMO-DoAnet: Multi-channel Input and Multiple Outputs DoA Network with
Unknown Number of Sound Sources
- Authors: Haoran Yin, Meng Ge, Yanjie Fu, Gaoyan Zhang, Longbiao Wang, Lei
Zhang, Lin Qiu and Jianwu Dang
- Abstract summary: Recent neural network based Direction of Arrival (DoA) estimation algorithms have performed well on unknown number of sound sources scenarios.
These algorithms are usually achieved by mapping the multi-channel audio input to the single output (i.e. overall spatial pseudo-spectrum (SPS) of all sources) that is called MISO.
We propose a novel multi-channel input and multiple outputs DoA network called SPS SPIE-DoAnet to address these limitations.
- Score: 56.41687729076406
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent neural network based Direction of Arrival (DoA) estimation algorithms
have performed well on unknown number of sound sources scenarios. These
algorithms are usually achieved by mapping the multi-channel audio input to the
single output (i.e. overall spatial pseudo-spectrum (SPS) of all sources), that
is called MISO. However, such MISO algorithms strongly depend on empirical
threshold setting and the angle assumption that the angles between the sound
sources are greater than a fixed angle. To address these limitations, we
propose a novel multi-channel input and multiple outputs DoA network called
MIMO-DoAnet. Unlike the general MISO algorithms, MIMO-DoAnet predicts the SPS
coding of each sound source with the help of the informative spatial covariance
matrix. By doing so, the threshold task of detecting the number of sound
sources becomes an easier task of detecting whether there is a sound source in
each output, and the serious interaction between sound sources disappears
during inference stage. Experimental results show that MIMO-DoAnet achieves
relative 18.6% and absolute 13.3%, relative 34.4% and absolute 20.2% F1 score
improvement compared with the MISO baseline system in 3, 4 sources scenes. The
results also demonstrate MIMO-DoAnet alleviates the threshold setting problem
and solves the angle assumption problem effectively.
Related papers
- MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition [62.89464258519723]
We propose a multi-layer cross-attention fusion based AVSR approach that promotes representation of each modality by fusing them at different levels of audio/visual encoders.
Our proposed approach surpasses the first-place system, establishing a new SOTA cpCER of 29.13% on this dataset.
arXiv Detail & Related papers (2024-01-07T08:59:32Z) - Towards Better Out-of-Distribution Generalization of Neural Algorithmic
Reasoning Tasks [51.8723187709964]
We study the OOD generalization of neural algorithmic reasoning tasks.
The goal is to learn an algorithm from input-output pairs using deep neural networks.
arXiv Detail & Related papers (2022-11-01T18:33:20Z) - Iterative Sound Source Localization for Unknown Number of Sources [57.006589498243336]
We propose an iterative sound source localization approach called ISSL, which can iteratively extract each source's DOA without threshold until the termination criterion is met.
Our ISSL achieves significant performance improvements in both DOA estimation and source number detection compared with the existing threshold-based algorithms.
arXiv Detail & Related papers (2022-06-24T13:19:44Z) - BDIS: Bayesian Dense Inverse Searching Method for Real-Time Stereo
Surgical Image Matching [2.990820994368054]
This paper proposes the first CPU-level real-time prior-free stereo matching algorithm for general MIS tasks.
We achieve an average 17 Hz on 640*480 images with a single-core CPU (i5-9400) for surgical images.
It has similar or higher accuracy and fewer outliers than the baseline ELAS in MIS, while it is 4-5 times faster.
arXiv Detail & Related papers (2022-05-06T10:50:49Z) - Machine Learning Methods for Spectral Efficiency Prediction in Massive
MIMO Systems [0.0]
We study several machine learning approaches to solve the problem of estimating the spectral efficiency (SE) value for a certain precoding scheme, preferably in the shortest possible time.
The best results in terms of mean average percentage error (MAPE) are obtained with gradient boosting over sorted features, while linear models demonstrate worse prediction quality.
We investigate the practical applicability of the proposed algorithms in a wide range of scenarios generated by the Quadriga simulator.
arXiv Detail & Related papers (2021-12-29T07:03:10Z) - DeepAoANet: Learning Angle of Arrival from Software Defined Radios with
Deep Neural Networks [39.65462454049291]
Existing algorithms perform poorly in resolving Angle of Arrival (AoA) in the presence of multipath or when operating in a weak signal regime.
We propose a Deep Learning approach to deriving AoA from a single snapshot of the SDR multichannel data.
Our proposed method demonstrates excellent reliability in determining number of impinging signals and realized mean absolute AoA errors less than $2circ$.
arXiv Detail & Related papers (2021-12-01T18:16:13Z) - Learning based signal detection for MIMO systems with unknown noise
statistics [84.02122699723536]
This paper aims to devise a generalized maximum likelihood (ML) estimator to robustly detect signals with unknown noise statistics.
In practice, there is little or even no statistical knowledge on the system noise, which in many cases is non-Gaussian, impulsive and not analyzable.
Our framework is driven by an unsupervised learning approach, where only the noise samples are required.
arXiv Detail & Related papers (2021-01-21T04:48:15Z) - Solving Sparse Linear Inverse Problems in Communication Systems: A Deep
Learning Approach With Adaptive Depth [51.40441097625201]
We propose an end-to-end trainable deep learning architecture for sparse signal recovery problems.
The proposed method learns how many layers to execute to emit an output, and the network depth is dynamically adjusted for each task in the inference phase.
arXiv Detail & Related papers (2020-10-29T06:32:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.