Blind Acoustic Room Parameter Estimation Using Phase Features
- URL: http://arxiv.org/abs/2303.07449v1
- Date: Mon, 13 Mar 2023 20:05:41 GMT
- Title: Blind Acoustic Room Parameter Estimation Using Phase Features
- Authors: Christopher Ick, Adib Mehrabi, Wenyu Jin
- Abstract summary: We propose utilizing novel phase-related features to extend recent approaches to blindly estimate the so-called "reverberation fingerprint" parameters.
The addition of these features is shown to outperform existing methods that rely solely on magnitude-based spectral features.
- Score: 4.473249957074495
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modeling room acoustics in a field setting involves some degree of blind
parameter estimation from noisy and reverberant audio. Modern approaches
leverage convolutional neural networks (CNNs) in tandem with time-frequency
representation. Using short-time Fourier transforms to develop these
spectrogram-like features has shown promising results, but this method
implicitly discards a significant amount of audio information in the phase
domain. Inspired by recent works in speech enhancement, we propose utilizing
novel phase-related features to extend recent approaches to blindly estimate
the so-called "reverberation fingerprint" parameters, namely, volume and RT60.
The addition of these features is shown to outperform existing methods that
rely solely on magnitude-based spectral features across a wide range of
acoustics spaces. We evaluate the effectiveness of the deployment of these
novel features in both single-parameter and multi-parameter estimation
strategies, using a novel dataset that consists of publicly available room
impulse responses (RIRs), synthesized RIRs, and in-house measurements of real
acoustic spaces.
Related papers
- Blind Estimation of Sub-band Acoustic Parameters from Ambisonics Recordings using Spectro-Spatial Covariance Features [10.480691005356967]
We propose a unified framework that blindly estimates reverberation time (T60), direct-to-reverberant ratio (DRR) and clarity (C50) across 10 frequency bands.
The proposed framework utilizes a novel feature named Spectro-Spatial Co Vector (SSCV), efficiently representing temporal, spectral as well as spatial information of the FOA signal.
arXiv Detail & Related papers (2024-11-05T15:20:23Z) - Data-Driven Room Acoustic Modeling Via Differentiable Feedback Delay Networks With Learnable Delay Lines [46.2770645198924]
We introduce a novel method for finding the parameters of a Feedback Delay Network (FDN)
The proposed approach involves the implementation of a differentiable FDN with trainable delay lines.
We show that the proposed method yields time-invariant frequency-independent FDNs capable of closely matching the desired acoustical characteristics.
arXiv Detail & Related papers (2024-03-29T10:48:32Z) - Neural Acoustic Context Field: Rendering Realistic Room Impulse Response
With Neural Fields [61.07542274267568]
This letter proposes a novel Neural Acoustic Context Field approach, called NACF, to parameterize an audio scene.
Driven by the unique properties of RIR, we design a temporal correlation module and multi-scale energy decay criterion.
Experimental results show that NACF outperforms existing field-based methods by a notable margin.
arXiv Detail & Related papers (2023-09-27T19:50:50Z) - Generative adversarial networks with physical sound field priors [6.256923690998173]
This paper presents a deep learning-based approach for learns-temporal reconstruction of sound fields using Generative Adversa Networks (GANs)
The proposed method uses a plane wave basis and the underlying statistical distributions of pressure in rooms to reconstruct sound fields from a limited number of measurements.
The results suggest that this approach provides a promising approach to sound field reconstruction using generative models that allow for a physically informed acoustics prior to problems.
arXiv Detail & Related papers (2023-08-01T10:11:23Z) - TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement [41.872384434583466]
We provide a differentiable estimator for four categories of low-level acoustic descriptors involving: frequency-related parameters, energy or amplitude-related parameters, spectral balance parameters, and temporal features.
We show that adding TAP as an auxiliary objective in speech enhancement produces speech with improved perceptual quality and intelligibility.
arXiv Detail & Related papers (2023-02-16T04:57:11Z) - Few-Shot Audio-Visual Learning of Environment Acoustics [89.16560042178523]
Room impulse response (RIR) functions capture how the surrounding physical environment transforms the sounds heard by a listener.
We explore how to infer RIRs based on a sparse set of images and echoes observed in the space.
In experiments using a state-of-the-art audio-visual simulator for 3D environments, we demonstrate that our method successfully generates arbitrary RIRs.
arXiv Detail & Related papers (2022-06-08T16:38:24Z) - Deep Impulse Responses: Estimating and Parameterizing Filters with Deep
Networks [76.830358429947]
Impulse response estimation in high noise and in-the-wild settings is a challenging problem.
We propose a novel framework for parameterizing and estimating impulse responses based on recent advances in neural representation learning.
arXiv Detail & Related papers (2022-02-07T18:57:23Z) - Data Fusion for Audiovisual Speaker Localization: Extending Dynamic
Stream Weights to the Spatial Domain [103.3388198420822]
Esting the positions of multiple speakers can be helpful for tasks like automatic speech recognition or speaker diarization.
This paper proposes a novel audiovisual data fusion framework for speaker localization by assigning individual dynamic stream weights to specific regions.
A performance evaluation using audiovisual recordings yields promising results, with the proposed fusion approach outperforming all baseline models.
arXiv Detail & Related papers (2021-02-23T09:59:31Z) - Temporal-Spatial Neural Filter: Direction Informed End-to-End
Multi-channel Target Speech Separation [66.46123655365113]
Target speech separation refers to extracting the target speaker's speech from mixed signals.
Two main challenges are the complex acoustic environment and the real-time processing requirement.
We propose a temporal-spatial neural filter, which directly estimates the target speech waveform from multi-speaker mixture.
arXiv Detail & Related papers (2020-01-02T11:12:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.