HRTF upsampling with a generative adversarial network using a gnomonic
equiangular projection
- URL: http://arxiv.org/abs/2306.05812v2
- Date: Tue, 27 Feb 2024 13:40:40 GMT
- Title: HRTF upsampling with a generative adversarial network using a gnomonic
equiangular projection
- Authors: Aidan O. T. Hogg, Mads Jenkins, He Liu, Isaac Squires, Samuel J.
Cooper and Lorenzo Picinali
- Abstract summary: This paper demonstrates how generative adversarial networks (GANs) can be applied to HRTF upsampling.
We propose a novel approach that transforms the HRTF data for direct use with a convolutional super-resolution generative adversarial network (SRGAN)
Experimental results show that the proposed method outperforms all three baselines in terms of log-spectral distortion (LSD) and localisation performance.
- Score: 3.921666645870036
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: An individualised head-related transfer function (HRTF) is very important for
creating realistic virtual reality (VR) and augmented reality (AR)
environments. However, acoustically measuring high-quality HRTFs requires
expensive equipment and an acoustic lab setting. To overcome these limitations
and to make this measurement more efficient HRTF upsampling has been exploited
in the past where a high-resolution HRTF is created from a low-resolution one.
This paper demonstrates how generative adversarial networks (GANs) can be
applied to HRTF upsampling. We propose a novel approach that transforms the
HRTF data for direct use with a convolutional super-resolution generative
adversarial network (SRGAN). This new approach is benchmarked against three
baselines: barycentric upsampling, spherical harmonic (SH) upsampling and an
HRTF selection approach. Experimental results show that the proposed method
outperforms all three baselines in terms of log-spectral distortion (LSD) and
localisation performance using perceptual models when the input HRTF is sparse
(less than 20 measured positions).
Related papers
- Enhanced Super-Resolution Training via Mimicked Alignment for Real-World Scenes [51.92255321684027]
We propose a novel plug-and-play module designed to mitigate misalignment issues by aligning LR inputs with HR images during training.
Specifically, our approach involves mimicking a novel LR sample that aligns with HR while preserving the characteristics of the original LR samples.
We comprehensively evaluate our method on synthetic and real-world datasets, demonstrating its effectiveness across a spectrum of SR models.
arXiv Detail & Related papers (2024-10-07T18:18:54Z) - HRTF Estimation using a Score-based Prior [20.62078965099636]
We present a head-related transfer function estimation method based on a score-based diffusion model.
The HRTF is estimated in reverberant environments using natural excitation signals, e.g. human speech.
We show that the diffusion prior can account for the large variability of high-frequency content in HRTFs.
arXiv Detail & Related papers (2024-10-02T14:00:41Z) - Fast LiDAR Upsampling using Conditional Diffusion Models [1.3709133749179265]
Existing approaches have shown the possibilities for using diffusion models to generate refined LiDAR data with high fidelity.
We introduce a novel approach based on conditional diffusion models for fast and high-quality sparse-to-dense upsampling of 3D scene point clouds.
Our method employs denoising diffusion probabilistic models trained with conditional inpainting masks, which have been shown to give high performance on image completion tasks.
arXiv Detail & Related papers (2024-05-08T08:38:28Z) - NeRF-DetS: Enhancing Multi-View 3D Object Detection with Sampling-adaptive Network of Continuous NeRF-based Representation [60.47114985993196]
NeRF-Det unifies the tasks of novel view arithmetic and 3D perception.
We introduce a novel 3D perception network structure, NeRF-DetS.
NeRF-DetS outperforms competitive NeRF-Det on the ScanNetV2 dataset.
arXiv Detail & Related papers (2024-04-22T06:59:03Z) - HRTF Interpolation using a Spherical Neural Process Meta-Learner [1.3505077405741583]
We introduce a Convolutional Neural Process meta-learner specialized in HRTF error correction.
A generic population-mean HRTF forms the initial estimates prior to corrections.
The trained model achieves up to 3 dB relative error reduction compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-10-20T11:41:54Z) - GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face
Synthesis [62.297513028116576]
GeneFace is a general and high-fidelity NeRF-based talking face generation method.
A head-aware torso-NeRF is proposed to eliminate the head-torso problem.
arXiv Detail & Related papers (2023-01-31T05:56:06Z) - Binaural Rendering of Ambisonic Signals by Neural Networks [28.056334728309423]
Experimental results show that neural networks outperform the conventional method in objective metrics and achieve comparable subjective metrics.
Our proposed system achieves an SDR of 7.32 and MOSs of 3.83, 3.58, 3.87, 3.58 in quality, timbre, localization, and immersion dimensions.
arXiv Detail & Related papers (2022-11-04T07:57:37Z) - Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition
with Source Localization [73.62550438861942]
This paper proposes a new paradigm for handling far-field multi-speaker data in an end-to-end neural network manner, called directional automatic speech recognition (D-ASR)
In D-ASR, the azimuth angle of the sources with respect to the microphone array is defined as a latent variable. This angle controls the quality of separation, which in turn determines the ASR performance.
arXiv Detail & Related papers (2020-10-30T20:26:28Z) - Conditioning Trick for Training Stable GANs [70.15099665710336]
We propose a conditioning trick, called difference departure from normality, applied on the generator network in response to instability issues during GAN training.
We force the generator to get closer to the departure from normality function of real samples computed in the spectral domain of Schur decomposition.
arXiv Detail & Related papers (2020-10-12T16:50:22Z) - Robust Sound Source Tracking Using SRP-PHAT and 3D Convolutional Neural
Networks [10.089520556398574]
We present a new single sound source DOA estimation and tracking system based on the SRP-PHAT algorithm and a three-dimensional Convolutional Neural Network.
It uses SRP-PHAT power maps as input features of a fully convolutional causal architecture that uses 3D convolutional layers to accurately perform the tracking of a sound source.
arXiv Detail & Related papers (2020-06-16T09:07:33Z) - Spatial-Spectral Residual Network for Hyperspectral Image
Super-Resolution [82.1739023587565]
We propose a novel spectral-spatial residual network for hyperspectral image super-resolution (SSRNet)
Our method can effectively explore spatial-spectral information by using 3D convolution instead of 2D convolution, which enables the network to better extract potential information.
In each unit, we employ spatial and temporal separable 3D convolution to extract spatial and spectral information, which not only reduces unaffordable memory usage and high computational cost, but also makes the network easier to train.
arXiv Detail & Related papers (2020-01-14T03:34:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.