Dual input neural networks for positional sound source localization
- URL: http://arxiv.org/abs/2308.04169v1
- Date: Tue, 8 Aug 2023 09:59:56 GMT
- Title: Dual input neural networks for positional sound source localization
- Authors: Eric Grinstein, Vincent W. Neo and Patrick A. Naylor
- Abstract summary: We introduce Dual Input Neural Networks (DI-NNs) as a simple and effective way to model these two data types in a neural network.
We train and evaluate our proposed DI-NN on scenarios of varying difficulty and realism and compare it against an alternative architecture.
Our results show that the DI-NN significantly outperforms the baselines, achieving a five times lower localization error than the LS method and two times lower than the CRNN in a test dataset of real recordings.
- Score: 19.07039703121673
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In many signal processing applications, metadata may be advantageously used
in conjunction with a high dimensional signal to produce a desired output. In
the case of classical Sound Source Localization (SSL) algorithms, information
from a high dimensional, multichannel audio signals received by many
distributed microphones is combined with information describing acoustic
properties of the scene, such as the microphones' coordinates in space, to
estimate the position of a sound source. We introduce Dual Input Neural
Networks (DI-NNs) as a simple and effective way to model these two data types
in a neural network. We train and evaluate our proposed DI-NN on scenarios of
varying difficulty and realism and compare it against an alternative
architecture, a classical Least-Squares (LS) method as well as a classical
Convolutional Recurrent Neural Network (CRNN). Our results show that the DI-NN
significantly outperforms the baselines, achieving a five times lower
localization error than the LS method and two times lower than the CRNN in a
test dataset of real recordings.
Related papers
- Neuromorphic Wireless Split Computing with Multi-Level Spikes [69.73249913506042]
In neuromorphic computing, spiking neural networks (SNNs) perform inference tasks, offering significant efficiency gains for workloads involving sequential data.
Recent advances in hardware and software have demonstrated that embedding a few bits of payload in each spike exchanged between the spiking neurons can further enhance inference accuracy.
This paper investigates a wireless neuromorphic split computing architecture employing multi-level SNNs.
arXiv Detail & Related papers (2024-11-07T14:08:35Z) - Spiking Neural Network Decision Feedback Equalization [70.3497683558609]
We propose an SNN-based equalizer with a feedback structure akin to the decision feedback equalizer (DFE)
We show that our approach clearly outperforms conventional linear equalizers for three different exemplary channels.
The proposed SNN with a decision feedback structure enables the path to competitive energy-efficient transceivers.
arXiv Detail & Related papers (2022-11-09T09:19:15Z) - Bayesian Neural Network Language Modeling for Speech Recognition [59.681758762712754]
State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex.
In this paper, an overarching full Bayesian learning framework is proposed to account for the underlying uncertainty in LSTM-RNN and Transformer LMs.
arXiv Detail & Related papers (2022-08-28T17:50:19Z) - Acoustic-Net: A Novel Neural Network for Sound Localization and
Quantification [28.670240455952317]
A novel neural network, termed the Acoustic-Net, is proposed to locate and quantify the sound source simply using the original signals.
The experiments demonstrate that the proposed method significantly improves the accuracy of sound source prediction and the computing speed.
arXiv Detail & Related papers (2022-03-31T12:20:09Z) - A Study of Designing Compact Audio-Visual Wake Word Spotting System
Based on Iterative Fine-Tuning in Neural Network Pruning [57.28467469709369]
We investigate on designing a compact audio-visual wake word spotting (WWS) system by utilizing visual information.
We introduce a neural network pruning strategy via the lottery ticket hypothesis in an iterative fine-tuning manner (LTH-IF)
The proposed audio-visual system achieves significant performance improvements over the single-modality (audio-only or video-only) system under different noisy conditions.
arXiv Detail & Related papers (2022-02-17T08:26:25Z) - Deep Networks for Direction-of-Arrival Estimation in Low SNR [89.45026632977456]
We introduce a Convolutional Neural Network (CNN) that is trained from mutli-channel data of the true array manifold matrix.
We train a CNN in the low-SNR regime to predict DoAs across all SNRs.
Our robust solution can be applied in several fields, ranging from wireless array sensors to acoustic microphones or sonars.
arXiv Detail & Related papers (2020-11-17T12:52:18Z) - Multi-Tones' Phase Coding (MTPC) of Interaural Time Difference by
Spiking Neural Network [68.43026108936029]
We propose a pure spiking neural network (SNN) based computational model for precise sound localization in the noisy real-world environment.
We implement this algorithm in a real-time robotic system with a microphone array.
The experiment results show a mean error azimuth of 13 degrees, which surpasses the accuracy of the other biologically plausible neuromorphic approach for sound source localization.
arXiv Detail & Related papers (2020-07-07T08:22:56Z) - Depthwise Separable Convolutions Versus Recurrent Neural Networks for
Monaural Singing Voice Separation [17.358040670413505]
We focus on singing voice separation, employing an RNN architecture, and we replace the RNNs with DWS convolutions (DWS-CNNs)
We conduct an ablation study and examine the effect of the number of channels and layers of DWS-CNNs on the source separation performance.
Our results show that by replacing RNNs with DWS-CNNs yields an improvement of 1.20, 0.06, 0.37 dB, respectively, while using only 20.57% of the amount of parameters of the RNN architecture.
arXiv Detail & Related papers (2020-07-06T12:32:34Z) - SNR-Based Features and Diverse Training Data for Robust DNN-Based Speech
Enhancement [21.346342164530967]
We analyze the generalization with respect to (1) the size and diversity of the training data, (2) different network architectures, and (3) the chosen features.
We show via experimental results and an analysis using t-distributed neighbor embedding (t-SNE) that the proposed SNR-NAT features yield robust and level independent results in unseen noise.
arXiv Detail & Related papers (2020-04-07T16:09:54Z) - Centimeter-Level Indoor Localization using Channel State Information
with Recurrent Neural Networks [12.193558591962754]
This paper proposes the neural network method to estimate the centimeter-level indoor positioning with real CSI data collected from linear antennas.
It utilizes an amplitude of channel response or a correlation matrix as the input, which can highly reduce the data size and suppress the noise.
Also, it makes use of the consistency in the user motion trajectory via Recurrent Neural Network (RNN) and signal-noise ratio (SNR) information, which can further improve the estimation accuracy.
arXiv Detail & Related papers (2020-02-04T17:10:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.