HRTF Estimation using a Score-based Prior
- URL: http://arxiv.org/abs/2410.01562v1
- Date: Wed, 2 Oct 2024 14:00:41 GMT
- Title: HRTF Estimation using a Score-based Prior
- Authors: Etienne Thuillier, Jean-Marie Lemercier, Eloi Moliner, Timo Gerkmann, Vesa Välimäki,
- Abstract summary: We present a head-related transfer function estimation method based on a score-based diffusion model.
The HRTF is estimated in reverberant environments using natural excitation signals, e.g. human speech.
We show that the diffusion prior can account for the large variability of high-frequency content in HRTFs.
- Score: 20.62078965099636
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a head-related transfer function (HRTF) estimation method which relies on a data-driven prior given by a score-based diffusion model. The HRTF is estimated in reverberant environments using natural excitation signals, e.g. human speech. The impulse response of the room is estimated along with the HRTF by optimizing a parametric model of reverberation based on the statistical behaviour of room acoustics. The posterior distribution of HRTF given the reverberant measurement and excitation signal is modelled using the score-based HRTF prior and a log-likelihood approximation. We show that the resulting method outperforms several baselines, including an oracle recommender system that assigns the optimal HRTF in our training set based on the smallest distance to the true HRTF at the given direction of arrival. In particular, we show that the diffusion prior can account for the large variability of high-frequency content in HRTFs.
Related papers
- DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval [49.076590578101985]
We present a diffusion-based ATR framework (DiffATR) that generates joint distribution from noise.
Experiments on the AudioCaps and Clotho datasets with superior performances, verify the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-16T06:33:26Z) - Deep adaptative spectral zoom for improved remote heart rate estimation [10.220888127527152]
Chirp-Z Transform (CZT) can refine the spectrum to the narrow-band range of interest for heart rate, providing improved frequential resolution and, consequently, more accurate estimation.
This paper presents the advantages of employing the CZT for remote HR estimation and introduces a novel data-driven adaptive CZT estimator.
arXiv Detail & Related papers (2024-03-11T16:55:19Z) - Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - Supervised Contrastive Learning based Dual-Mixer Model for Remaining
Useful Life Prediction [3.081898819471624]
The Remaining Useful Life (RUL) prediction aims at providing an accurate estimate of the remaining time from the current predicting moment to the complete failure of the device.
To overcome the shortcomings of rigid combination for temporal and spatial features in most existing RUL prediction approaches, a spatial-temporal homogeneous feature extractor, named Dual-Mixer model, is proposed.
The effectiveness of the proposed method is validated through comparisons with other latest research works on the C-MAPSS dataset.
arXiv Detail & Related papers (2024-01-29T14:38:44Z) - HRTF Interpolation using a Spherical Neural Process Meta-Learner [1.3505077405741583]
We introduce a Convolutional Neural Process meta-learner specialized in HRTF error correction.
A generic population-mean HRTF forms the initial estimates prior to corrections.
The trained model achieves up to 3 dB relative error reduction compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-10-20T11:41:54Z) - Score-based Source Separation with Applications to Digital Communication
Signals [72.6570125649502]
We propose a new method for separating superimposed sources using diffusion-based generative models.
Motivated by applications in radio-frequency (RF) systems, we are interested in sources with underlying discrete nature.
Our method can be viewed as a multi-source extension to the recently proposed score distillation sampling scheme.
arXiv Detail & Related papers (2023-06-26T04:12:40Z) - HRTF upsampling with a generative adversarial network using a gnomonic
equiangular projection [3.921666645870036]
This paper demonstrates how generative adversarial networks (GANs) can be applied to HRTF upsampling.
We propose a novel approach that transforms the HRTF data for direct use with a convolutional super-resolution generative adversarial network (SRGAN)
Experimental results show that the proposed method outperforms all three baselines in terms of log-spectral distortion (LSD) and localisation performance.
arXiv Detail & Related papers (2023-06-09T11:05:09Z) - How to Estimate Model Transferability of Pre-Trained Speech Models? [84.11085139766108]
"Score-based assessment" framework for estimating transferability of pre-trained speech models.
We leverage upon two representation theories, Bayesian likelihood estimation and optimal transport, to generate rank scores for the PSM candidates.
Our framework efficiently computes transferability scores without actual fine-tuning of candidate models or layers.
arXiv Detail & Related papers (2023-06-01T04:52:26Z) - Low-Resource Music Genre Classification with Cross-Modal Neural Model
Reprogramming [129.4950757742912]
We introduce a novel method for leveraging pre-trained models for low-resource (music) classification based on the concept of Neural Model Reprogramming (NMR)
NMR aims at re-purposing a pre-trained model from a source domain to a target domain by modifying the input of a frozen pre-trained model.
Experimental results suggest that a neural model pre-trained on large-scale datasets can successfully perform music genre classification by using this reprogramming method.
arXiv Detail & Related papers (2022-11-02T17:38:33Z) - DEEPF0: End-To-End Fundamental Frequency Estimation for Music and Speech
Signals [11.939409227407769]
We propose a novel pitch estimation technique called DeepF0.
It leverages the available annotated data to directly learn from the raw audio in a data-driven manner.
arXiv Detail & Related papers (2021-02-11T23:11:22Z) - Real Time Speech Enhancement in the Waveform Domain [99.02180506016721]
We present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU.
The proposed model is based on an encoder-decoder architecture with skip-connections.
It is capable of removing various kinds of background noise including stationary and non-stationary noises.
arXiv Detail & Related papers (2020-06-23T09:19:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.