Related papers: A Machine Learning Approach for Denoising and Upsampling HRTFs

A Machine Learning Approach for Denoising and Upsampling HRTFs

URL: http://arxiv.org/abs/2504.17586v1
Date: Thu, 24 Apr 2025 14:17:57 GMT
Title: A Machine Learning Approach for Denoising and Upsampling HRTFs
Authors: Xuyi Hu, Jian Li, Lorenzo Picinali, Aidan O. T. Hogg,
Abstract summary: Head-Related Transfer Functions (HRTFs) capture how sound reaches our ears, reflecting unique anatomical features and enhancing spatial perception.<n>It has been shown that personalized HRTFs improve localization accuracy, but their measurement remains time-consuming and requires a noise-free environment.<n>This paper proposes a method to address this constraint by presenting a novel technique that can upsample sparse, noisy HRTF measurements.<n>The proposed method achieves a log-spectral distortion (LSD) error of 5.41 dB and a cosine similarity loss of 0.0070, demonstrating the method's effectiveness in HRTF upsampling.
Score: 5.954160581274925
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The demand for realistic virtual immersive audio continues to grow, with Head-Related Transfer Functions (HRTFs) playing a key role. HRTFs capture how sound reaches our ears, reflecting unique anatomical features and enhancing spatial perception. It has been shown that personalized HRTFs improve localization accuracy, but their measurement remains time-consuming and requires a noise-free environment. Although machine learning has been shown to reduce the required measurement points and, thus, the measurement time, a controlled environment is still necessary. This paper proposes a method to address this constraint by presenting a novel technique that can upsample sparse, noisy HRTF measurements. The proposed approach combines an HRTF Denoisy U-Net for denoising and an Autoencoding Generative Adversarial Network (AE-GAN) for upsampling from three measurement points. The proposed method achieves a log-spectral distortion (LSD) error of 5.41 dB and a cosine similarity loss of 0.0070, demonstrating the method's effectiveness in HRTF upsampling.

Related papers

Time Series Similarity Score Functions to Monitor and Interact with the Training and Denoising Process of a Time Series Diffusion Model applied to a Human Activity Recognition Dataset based on IMUs [0.0]
diffusion probabilistic models are able to generate synthetic sensor signals.<n>The training process is controlled by a loss function which measures the difference between the noise that was added in the forward process and the noise that was predicted by the diffusion model.<n>We examine multiple similarity metrics and adapt an existing metric to overcome this issue by monitoring the training and synthetisation process.
arXiv Detail & Related papers (2025-05-20T06:38:17Z)
J-Invariant Volume Shuffle for Self-Supervised Cryo-Electron Tomogram Denoising on Single Noisy Volume [11.183171651157892]
Cryo-Electron Tomography (Cryo-ET) enables detailed 3D visualization of cellular structures in near-native states.<n>Cryo-ET suffers from low signal-to-noise ratio due to imaging constraints.<n>We propose a novel self-supervised learning model that denoises Cryo-ET volumetric images using a single noisy volume.
arXiv Detail & Related papers (2024-11-22T08:06:12Z)
DiffSED: Sound Event Detection with Denoising Diffusion [70.18051526555512]
We reformulate the SED problem by taking a generative learning perspective. Specifically, we aim to generate sound temporal boundaries from noisy proposals in a denoising diffusion process. During training, our model learns to reverse the noising process by converting noisy latent queries to the groundtruth versions.
arXiv Detail & Related papers (2023-08-14T17:29:41Z)
HRTF upsampling with a generative adversarial network using a gnomonic equiangular projection [3.921666645870036]
This paper demonstrates how generative adversarial networks (GANs) can be applied to HRTF upsampling. We propose a novel approach that transforms the HRTF data for direct use with a convolutional super-resolution generative adversarial network (SRGAN) Experimental results show that the proposed method outperforms all three baselines in terms of log-spectral distortion (LSD) and localisation performance.
arXiv Detail & Related papers (2023-06-09T11:05:09Z)
Degradation-Noise-Aware Deep Unfolding Transformer for Hyperspectral Image Denoising [9.119226249676501]
Hyperspectral images (HSIs) are often quite noisy because of narrow band spectral filtering. To reduce the noise in HSI data cubes, both model-driven and learning-based denoising algorithms have been proposed. This paper proposes a Degradation-Noise-Aware Unfolding Network (DNA-Net) that addresses these issues.
arXiv Detail & Related papers (2023-05-06T13:28:20Z)
Amplitude-Varying Perturbation for Balancing Privacy and Utility in Federated Learning [86.08285033925597]
This paper presents a new DP perturbation mechanism with a time-varying noise amplitude to protect the privacy of federated learning. We derive an online refinement of the series to prevent FL from premature convergence resulting from excessive perturbation noise. The contribution of the new DP mechanism to the convergence and accuracy of privacy-preserving FL is corroborated, compared to the state-of-the-art Gaussian noise mechanism with a persistent noise amplitude.
arXiv Detail & Related papers (2023-03-07T22:52:40Z)
Latent Class-Conditional Noise Model [54.56899309997246]
We introduce a Latent Class-Conditional Noise model (LCCN) to parameterize the noise transition under a Bayesian framework. We then deduce a dynamic label regression method for LCCN, whose Gibbs sampler allows us efficiently infer the latent true labels. Our approach safeguards the stable update of the noise transition, which avoids previous arbitrarily tuning from a mini-batch of samples.
arXiv Detail & Related papers (2023-02-19T15:24:37Z)
An investigation of the reconstruction capacity of stacked convolutional autoencoders for log-mel-spectrograms [2.3204178451683264]
In audio processing applications, the generation of expressive sounds based on high-level representations demonstrates a high demand. Modern algorithms, such as neural networks, have inspired the development of expressive synthesizers based on musical instrument compression. This study investigates the use of stacked convolutional autoencoders for the compression of time-frequency audio representations for a variety of instruments for a single pitch.
arXiv Detail & Related papers (2023-01-18T17:19:04Z)
High Fidelity Neural Audio Compression [92.4812002532009]
We introduce a state-of-the-art real-time, high-fidelity, audio leveraging neural networks. It consists in a streaming encoder-decoder architecture with quantized latent space trained in an end-to-end fashion. We simplify and speed-up the training by using a single multiscale spectrogram adversary.
arXiv Detail & Related papers (2022-10-24T17:52:02Z)
Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction [109.44933866397123]
Noise robustness is essential for deploying automatic speech recognition systems in real-world environments. We employ a noise-robust representation learned by a refined self-supervised framework for noisy speech recognition. We achieve comparable performance to the best supervised approach reported with only 16% of labeled data.
arXiv Detail & Related papers (2021-10-28T20:39:02Z)
Using deep learning to understand and mitigate the qubit noise environment [0.0]
We propose to address the challenge of extracting accurate noise spectra from time-dynamics measurements on qubits. We demonstrate a neural network based methodology that allows for extraction of the noise spectrum associated with any qubit surrounded by an arbitrary bath. Our results can be applied to a wide range of qubit platforms and provide a framework for improving qubit performance.
arXiv Detail & Related papers (2020-05-03T17:13:14Z)
Simultaneous Denoising and Dereverberation Using Deep Embedding Features [64.58693911070228]
We propose a joint training method for simultaneous speech denoising and dereverberation using deep embedding features. At the denoising stage, the DC network is leveraged to extract noise-free deep embedding features. At the dereverberation stage, instead of using the unsupervised K-means clustering algorithm, another neural network is utilized to estimate the anechoic speech.
arXiv Detail & Related papers (2020-04-06T06:34:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.