Related papers: HRTF Estimation using a Score-based Prior

HRTF Estimation using a Score-based Prior

URL: http://arxiv.org/abs/2410.01562v1
Date: Wed, 2 Oct 2024 14:00:41 GMT
Title: HRTF Estimation using a Score-based Prior
Authors: Etienne Thuillier, Jean-Marie Lemercier, Eloi Moliner, Timo Gerkmann, Vesa Välimäki,
Abstract summary: We present a head-related transfer function estimation method based on a score-based diffusion model. The HRTF is estimated in reverberant environments using natural excitation signals, e.g. human speech. We show that the diffusion prior can account for the large variability of high-frequency content in HRTFs.
Score: 20.62078965099636
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a head-related transfer function (HRTF) estimation method which relies on a data-driven prior given by a score-based diffusion model. The HRTF is estimated in reverberant environments using natural excitation signals, e.g. human speech. The impulse response of the room is estimated along with the HRTF by optimizing a parametric model of reverberation based on the statistical behaviour of room acoustics. The posterior distribution of HRTF given the reverberant measurement and excitation signal is modelled using the score-based HRTF prior and a log-likelihood approximation. We show that the resulting method outperforms several baselines, including an oracle recommender system that assigns the optimal HRTF in our training set based on the smallest distance to the true HRTF at the given direction of arrival. In particular, we show that the diffusion prior can account for the large variability of high-frequency content in HRTFs.

Related papers

Foundation Model Hidden Representations for Heart Rate Estimation from Auscultation [3.1379239557375223]
Auscultation, particularly heart sound, is a non-invasive technique that provides essential vital sign information.<n>Recently, self-supervised acoustic representation foundation models (FMs) have been proposed to offer insights into acoustics-based vital signs.
arXiv Detail & Related papers (2025-05-27T05:36:25Z)
Flow Matching based Sequential Recommender Model [54.815225661065924]
This study introduces FMRec, a Flow Matching based model that employs a straight flow trajectory and a modified loss tailored for the recommendation task.<n>FMRec achieves an average improvement of 6.53% over state-of-the-art methods.
arXiv Detail & Related papers (2025-05-22T06:53:03Z)
End-to-End Multi-Microphone Speaker Extraction Using Relative Transfer Functions [16.402201426448006]
This paper introduces a multi-microphone method for extracting a desired speaker from a mixture involving multiple speakers and directional noise in a reverberant environment. Experimental results in challenging acoustic scenarios demonstrate that using spatial cues yields better performance than the spectral-based cue and that the instantaneous outperforms the DOA-based spatial cue.
arXiv Detail & Related papers (2025-02-10T09:27:44Z)
Arbitrary-steps Image Super-resolution via Diffusion Inversion [68.78628844966019]
This study presents a new image super-resolution (SR) technique based on diffusion inversion, aiming at harnessing the rich image priors encapsulated in large pre-trained diffusion models to improve SR performance. We design a Partial noise Prediction strategy to construct an intermediate state of the diffusion model, which serves as the starting sampling point. Once trained, this noise predictor can be used to initialize the sampling process partially along the diffusion trajectory, generating the desirable high-resolution result.
arXiv Detail & Related papers (2024-12-12T07:24:13Z)
DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval [49.076590578101985]
We present a diffusion-based ATR framework (DiffATR) that generates joint distribution from noise. Experiments on the AudioCaps and Clotho datasets with superior performances, verify the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-16T06:33:26Z)
Deep adaptative spectral zoom for improved remote heart rate estimation [10.220888127527152]
Chirp-Z Transform (CZT) can refine the spectrum to the narrow-band range of interest for heart rate, providing improved frequential resolution and, consequently, more accurate estimation. This paper presents the advantages of employing the CZT for remote HR estimation and introduces a novel data-driven adaptive CZT estimator.
arXiv Detail & Related papers (2024-03-11T16:55:19Z)
Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets. We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z)
Supervised Contrastive Learning based Dual-Mixer Model for Remaining Useful Life Prediction [3.081898819471624]
The Remaining Useful Life (RUL) prediction aims at providing an accurate estimate of the remaining time from the current predicting moment to the complete failure of the device. To overcome the shortcomings of rigid combination for temporal and spatial features in most existing RUL prediction approaches, a spatial-temporal homogeneous feature extractor, named Dual-Mixer model, is proposed. The effectiveness of the proposed method is validated through comparisons with other latest research works on the C-MAPSS dataset.
arXiv Detail & Related papers (2024-01-29T14:38:44Z)
HRTF Interpolation using a Spherical Neural Process Meta-Learner [1.3505077405741583]
We introduce a Convolutional Neural Process meta-learner specialized in HRTF error correction. A generic population-mean HRTF forms the initial estimates prior to corrections. The trained model achieves up to 3 dB relative error reduction compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-10-20T11:41:54Z)
Score-based Source Separation with Applications to Digital Communication Signals [72.6570125649502]
We propose a new method for separating superimposed sources using diffusion-based generative models. Motivated by applications in radio-frequency (RF) systems, we are interested in sources with underlying discrete nature. Our method can be viewed as a multi-source extension to the recently proposed score distillation sampling scheme.
arXiv Detail & Related papers (2023-06-26T04:12:40Z)
HRTF upsampling with a generative adversarial network using a gnomonic equiangular projection [3.921666645870036]
This paper demonstrates how generative adversarial networks (GANs) can be applied to HRTF upsampling. We propose a novel approach that transforms the HRTF data for direct use with a convolutional super-resolution generative adversarial network (SRGAN) Experimental results show that the proposed method outperforms all three baselines in terms of log-spectral distortion (LSD) and localisation performance.
arXiv Detail & Related papers (2023-06-09T11:05:09Z)
How to Estimate Model Transferability of Pre-Trained Speech Models? [84.11085139766108]
"Score-based assessment" framework for estimating transferability of pre-trained speech models. We leverage upon two representation theories, Bayesian likelihood estimation and optimal transport, to generate rank scores for the PSM candidates. Our framework efficiently computes transferability scores without actual fine-tuning of candidate models or layers.
arXiv Detail & Related papers (2023-06-01T04:52:26Z)
Low-Resource Music Genre Classification with Cross-Modal Neural Model Reprogramming [129.4950757742912]
We introduce a novel method for leveraging pre-trained models for low-resource (music) classification based on the concept of Neural Model Reprogramming (NMR) NMR aims at re-purposing a pre-trained model from a source domain to a target domain by modifying the input of a frozen pre-trained model. Experimental results suggest that a neural model pre-trained on large-scale datasets can successfully perform music genre classification by using this reprogramming method.
arXiv Detail & Related papers (2022-11-02T17:38:33Z)
DEEPF0: End-To-End Fundamental Frequency Estimation for Music and Speech Signals [11.939409227407769]
We propose a novel pitch estimation technique called DeepF0. It leverages the available annotated data to directly learn from the raw audio in a data-driven manner.
arXiv Detail & Related papers (2021-02-11T23:11:22Z)
Real Time Speech Enhancement in the Waveform Domain [99.02180506016721]
We present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is capable of removing various kinds of background noise including stationary and non-stationary noises.
arXiv Detail & Related papers (2020-06-23T09:19:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.