Extending GCC-PHAT using Shift Equivariant Neural Networks
- URL: http://arxiv.org/abs/2208.04654v1
- Date: Tue, 9 Aug 2022 10:31:10 GMT
- Title: Extending GCC-PHAT using Shift Equivariant Neural Networks
- Authors: Axel Berg, Mark O'Connor, Kalle {\AA}str\"om, Magnus Oskarsson
- Abstract summary: Methods based on the generalized cross correlation with phase transform (GCC-PHAT) have been widely adopted for speaker localization.
We propose a novel approach to extending the GCC-PHAT, where the received signals are filtered using a shift equivariant neural network.
We show that our model consistently reduces the error of the GCC-PHAT in adverse environments, with guarantees of exact time delay recovery.
- Score: 17.70159660438739
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Speaker localization using microphone arrays depends on accurate time delay
estimation techniques. For decades, methods based on the generalized cross
correlation with phase transform (GCC-PHAT) have been widely adopted for this
purpose. Recently, the GCC-PHAT has also been used to provide input features to
neural networks in order to remove the effects of noise and reverberation, but
at the cost of losing theoretical guarantees in noise-free conditions. We
propose a novel approach to extending the GCC-PHAT, where the received signals
are filtered using a shift equivariant neural network that preserves the timing
information contained in the signals. By extensive experiments we show that our
model consistently reduces the error of the GCC-PHAT in adverse environments,
with guarantees of exact time delay recovery in ideal conditions.
Related papers
- Blind Estimation of Sub-band Acoustic Parameters from Ambisonics Recordings using Spectro-Spatial Covariance Features [10.480691005356967]
We propose a unified framework that blindly estimates reverberation time (T60), direct-to-reverberant ratio (DRR) and clarity (C50) across 10 frequency bands.
The proposed framework utilizes a novel feature named Spectro-Spatial Co Vector (SSCV), efficiently representing temporal, spectral as well as spatial information of the FOA signal.
arXiv Detail & Related papers (2024-11-05T15:20:23Z) - Data-Driven Room Acoustic Modeling Via Differentiable Feedback Delay Networks With Learnable Delay Lines [46.2770645198924]
We introduce a novel method for finding the parameters of a Feedback Delay Network (FDN)
The proposed approach involves the implementation of a differentiable FDN with trainable delay lines.
We show that the proposed method yields time-invariant frequency-independent FDNs capable of closely matching the desired acoustical characteristics.
arXiv Detail & Related papers (2024-03-29T10:48:32Z) - Deep Learning-Based Frequency Offset Estimation [7.143765507026541]
We show the utilization of deep learning for CFO estimation by employing a residual network (ResNet) to learn and extract signal features.
In comparison to the commonly used traditional CFO estimation methods, our proposed IQ-ResNet method exhibits superior performance across various scenarios.
arXiv Detail & Related papers (2023-11-08T13:56:22Z) - Latent Class-Conditional Noise Model [54.56899309997246]
We introduce a Latent Class-Conditional Noise model (LCCN) to parameterize the noise transition under a Bayesian framework.
We then deduce a dynamic label regression method for LCCN, whose Gibbs sampler allows us efficiently infer the latent true labels.
Our approach safeguards the stable update of the noise transition, which avoids previous arbitrarily tuning from a mini-batch of samples.
arXiv Detail & Related papers (2023-02-19T15:24:37Z) - NAF: Neural Attenuation Fields for Sparse-View CBCT Reconstruction [79.13750275141139]
This paper proposes a novel and fast self-supervised solution for sparse-view CBCT reconstruction.
The desired attenuation coefficients are represented as a continuous function of 3D spatial coordinates, parameterized by a fully-connected deep neural network.
A learning-based encoder entailing hash coding is adopted to help the network capture high-frequency details.
arXiv Detail & Related papers (2022-09-29T04:06:00Z) - Deep Learning-Based Synchronization for Uplink NB-IoT [72.86843435313048]
We propose a neural network (NN)-based algorithm for device detection and time of arrival (ToA) estimation for the narrowband physical random-access channel (NPRACH) of narrowband internet of things (NB-IoT)
The introduced NN architecture leverages residual convolutional networks as well as knowledge of the preamble structure of the 5G New Radio (5G NR) specifications.
arXiv Detail & Related papers (2022-05-22T12:16:43Z) - SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with
Adaptive Noise Spectral Shaping [51.698273019061645]
SpecGrad adapts the diffusion noise so that its time-varying spectral envelope becomes close to the conditioning log-mel spectrogram.
It is processed in the time-frequency domain to keep the computational cost almost the same as the conventional DDPM-based neural vocoders.
arXiv Detail & Related papers (2022-03-31T02:08:27Z) - Blind Coherent Preamble Detection via Neural Networks [2.2063018784238984]
We propose a neural network (NN) sequence detector and timing advanced estimator.
We do not replace the whole process of preamble detection by a NN.
We propose to use NN only for textitblind coherent combining of the signals in the detector to compensate for the channel effect.
arXiv Detail & Related papers (2021-09-30T09:53:49Z) - Real-time gravitational-wave science with neural posterior estimation [64.67121167063696]
We demonstrate unprecedented accuracy for rapid gravitational-wave parameter estimation with deep learning.
We analyze eight gravitational-wave events from the first LIGO-Virgo Gravitational-Wave Transient Catalog.
We find very close quantitative agreement with standard inference codes, but with inference times reduced from O(day) to a minute per event.
arXiv Detail & Related papers (2021-06-23T18:00:05Z) - Frequency Gating: Improved Convolutional Neural Networks for Speech
Enhancement in the Time-Frequency Domain [37.722450363816144]
We introduce a method, which we call Frequency Gating, to compute multiplicative weights for the kernels of the CNN.
Experiments with an autoencoder neural network with skip connections show that both local and frequency-wise gating outperform the baseline.
A loss function based on the extended short-time objective intelligibility score (ESTOI) is introduced, which we show to outperform the standard mean squared error (MSE) loss function.
arXiv Detail & Related papers (2020-11-08T22:04:00Z) - Conditioning Trick for Training Stable GANs [70.15099665710336]
We propose a conditioning trick, called difference departure from normality, applied on the generator network in response to instability issues during GAN training.
We force the generator to get closer to the departure from normality function of real samples computed in the spectral domain of Schur decomposition.
arXiv Detail & Related papers (2020-10-12T16:50:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.