Related papers: Dual input neural networks for positional sound source localization

Dual input neural networks for positional sound source localization

URL: http://arxiv.org/abs/2308.04169v1
Date: Tue, 8 Aug 2023 09:59:56 GMT
Title: Dual input neural networks for positional sound source localization
Authors: Eric Grinstein, Vincent W. Neo and Patrick A. Naylor
Abstract summary: We introduce Dual Input Neural Networks (DI-NNs) as a simple and effective way to model these two data types in a neural network. We train and evaluate our proposed DI-NN on scenarios of varying difficulty and realism and compare it against an alternative architecture. Our results show that the DI-NN significantly outperforms the baselines, achieving a five times lower localization error than the LS method and two times lower than the CRNN in a test dataset of real recordings.
Score: 19.07039703121673
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In many signal processing applications, metadata may be advantageously used in conjunction with a high dimensional signal to produce a desired output. In the case of classical Sound Source Localization (SSL) algorithms, information from a high dimensional, multichannel audio signals received by many distributed microphones is combined with information describing acoustic properties of the scene, such as the microphones' coordinates in space, to estimate the position of a sound source. We introduce Dual Input Neural Networks (DI-NNs) as a simple and effective way to model these two data types in a neural network. We train and evaluate our proposed DI-NN on scenarios of varying difficulty and realism and compare it against an alternative architecture, a classical Least-Squares (LS) method as well as a classical Convolutional Recurrent Neural Network (CRNN). Our results show that the DI-NN significantly outperforms the baselines, achieving a five times lower localization error than the LS method and two times lower than the CRNN in a test dataset of real recordings.

Related papers

Neuromorphic Wireless Split Computing with Multi-Level Spikes [69.73249913506042]
In neuromorphic computing, spiking neural networks (SNNs) perform inference tasks, offering significant efficiency gains for workloads involving sequential data. Recent advances in hardware and software have demonstrated that embedding a few bits of payload in each spike exchanged between the spiking neurons can further enhance inference accuracy. This paper investigates a wireless neuromorphic split computing architecture employing multi-level SNNs.
arXiv Detail & Related papers (2024-11-07T14:08:35Z)
Low-power SNN-based audio source localisation using a Hilbert Transform spike encoding scheme [4.49657690895714]
Sound source localisation is used in many consumer devices, to isolate audio from individual speakers and reject noise. Dense band-pass filters are often needed to obtain narrowband signal components from wideband audio. We demonstrate a novel method for sound source localisation on arbitrary microphone arrays, designed for efficient implementation in ultra-low-power spiking neural networks (SNNs) Our approach achieves state-of-the-art accuracy for SNN methods, comparable with traditional non-SNN super-resolution beamforming.
arXiv Detail & Related papers (2024-02-19T00:21:13Z)
Spiking Neural Network Decision Feedback Equalization [70.3497683558609]
We propose an SNN-based equalizer with a feedback structure akin to the decision feedback equalizer (DFE) We show that our approach clearly outperforms conventional linear equalizers for three different exemplary channels. The proposed SNN with a decision feedback structure enables the path to competitive energy-efficient transceivers.
arXiv Detail & Related papers (2022-11-09T09:19:15Z)
Bayesian Neural Network Language Modeling for Speech Recognition [59.681758762712754]
State-of-the-art neural network language models (NNLMs) represented by long short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming highly complex. In this paper, an overarching full Bayesian learning framework is proposed to account for the underlying uncertainty in LSTM-RNN and Transformer LMs.
arXiv Detail & Related papers (2022-08-28T17:50:19Z)
Acoustic-Net: A Novel Neural Network for Sound Localization and Quantification [28.670240455952317]
A novel neural network, termed the Acoustic-Net, is proposed to locate and quantify the sound source simply using the original signals. The experiments demonstrate that the proposed method significantly improves the accuracy of sound source prediction and the computing speed.
arXiv Detail & Related papers (2022-03-31T12:20:09Z)
A Study of Designing Compact Audio-Visual Wake Word Spotting System Based on Iterative Fine-Tuning in Neural Network Pruning [57.28467469709369]
We investigate on designing a compact audio-visual wake word spotting (WWS) system by utilizing visual information. We introduce a neural network pruning strategy via the lottery ticket hypothesis in an iterative fine-tuning manner (LTH-IF) The proposed audio-visual system achieves significant performance improvements over the single-modality (audio-only or video-only) system under different noisy conditions.
arXiv Detail & Related papers (2022-02-17T08:26:25Z)
Deep Networks for Direction-of-Arrival Estimation in Low SNR [89.45026632977456]
We introduce a Convolutional Neural Network (CNN) that is trained from mutli-channel data of the true array manifold matrix. We train a CNN in the low-SNR regime to predict DoAs across all SNRs. Our robust solution can be applied in several fields, ranging from wireless array sensors to acoustic microphones or sonars.
arXiv Detail & Related papers (2020-11-17T12:52:18Z)
Multi-Tones' Phase Coding (MTPC) of Interaural Time Difference by Spiking Neural Network [68.43026108936029]
We propose a pure spiking neural network (SNN) based computational model for precise sound localization in the noisy real-world environment. We implement this algorithm in a real-time robotic system with a microphone array. The experiment results show a mean error azimuth of 13 degrees, which surpasses the accuracy of the other biologically plausible neuromorphic approach for sound source localization.
arXiv Detail & Related papers (2020-07-07T08:22:56Z)
Depthwise Separable Convolutions Versus Recurrent Neural Networks for Monaural Singing Voice Separation [17.358040670413505]
We focus on singing voice separation, employing an RNN architecture, and we replace the RNNs with DWS convolutions (DWS-CNNs) We conduct an ablation study and examine the effect of the number of channels and layers of DWS-CNNs on the source separation performance. Our results show that by replacing RNNs with DWS-CNNs yields an improvement of 1.20, 0.06, 0.37 dB, respectively, while using only 20.57% of the amount of parameters of the RNN architecture.
arXiv Detail & Related papers (2020-07-06T12:32:34Z)
AutoSpeech: Neural Architecture Search for Speaker Recognition [108.69505815793028]
We propose the first neural architecture search approach approach for the speaker recognition tasks, named as AutoSpeech. Our algorithm first identifies the optimal operation combination in a neural cell and then derives a CNN model by stacking the neural cell for multiple times. Results demonstrate that the derived CNN architectures significantly outperform current speaker recognition systems based on VGG-M, ResNet-18, and ResNet-34 back-bones, while enjoying lower model complexity.
arXiv Detail & Related papers (2020-05-07T02:53:47Z)
SNR-Based Features and Diverse Training Data for Robust DNN-Based Speech Enhancement [21.346342164530967]
We analyze the generalization with respect to (1) the size and diversity of the training data, (2) different network architectures, and (3) the chosen features. We show via experimental results and an analysis using t-distributed neighbor embedding (t-SNE) that the proposed SNR-NAT features yield robust and level independent results in unseen noise.
arXiv Detail & Related papers (2020-04-07T16:09:54Z)
Centimeter-Level Indoor Localization using Channel State Information with Recurrent Neural Networks [12.193558591962754]
This paper proposes the neural network method to estimate the centimeter-level indoor positioning with real CSI data collected from linear antennas. It utilizes an amplitude of channel response or a correlation matrix as the input, which can highly reduce the data size and suppress the noise. Also, it makes use of the consistency in the user motion trajectory via Recurrent Neural Network (RNN) and signal-noise ratio (SNR) information, which can further improve the estimation accuracy.
arXiv Detail & Related papers (2020-02-04T17:10:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.