Data-Driven Room Acoustic Modeling Via Differentiable Feedback Delay Networks With Learnable Delay Lines
- URL: http://arxiv.org/abs/2404.00082v4
- Date: Tue, 08 Oct 2024 09:28:23 GMT
- Title: Data-Driven Room Acoustic Modeling Via Differentiable Feedback Delay Networks With Learnable Delay Lines
- Authors: Alessandro Ilic Mezza, Riccardo Giampiccolo, Enzo De Sena, Alberto Bernardini,
- Abstract summary: We introduce a novel method for finding the parameters of a Feedback Delay Network (FDN)
The proposed approach involves the implementation of a differentiable FDN with trainable delay lines.
We show that the proposed method yields time-invariant frequency-independent FDNs capable of closely matching the desired acoustical characteristics.
- Score: 46.2770645198924
- License:
- Abstract: Over the past few decades, extensive research has been devoted to the design of artificial reverberation algorithms aimed at emulating the room acoustics of physical environments. Despite significant advancements, automatic parameter tuning of delay-network models remains an open challenge. We introduce a novel method for finding the parameters of a Feedback Delay Network (FDN) such that its output renders target attributes of a measured room impulse response. The proposed approach involves the implementation of a differentiable FDN with trainable delay lines, which, for the first time, allows us to simultaneously learn each and every delay-network parameter via backpropagation. The iterative optimization process seeks to minimize a perceptually-motivated time-domain loss function incorporating differentiable terms accounting for energy decay and echo density. Through experimental validation, we show that the proposed method yields time-invariant frequency-independent FDNs capable of closely matching the desired acoustical characteristics, and outperforms existing methods based on genetic algorithms and analytical FDN design.
Related papers
- Coarse-to-Fine Proposal Refinement Framework for Audio Temporal Forgery Detection and Localization [60.899082019130766]
We introduce a frame-level detection network (FDN) and a proposal refinement network (PRN) for audio temporal forgery detection and localization.
FDN aims to mine informative inconsistency cues between real and fake frames to obtain discriminative features that are beneficial for roughly indicating forgery regions.
PRN is responsible for predicting confidence scores and regression offsets to refine the coarse-grained proposals derived from the FDN.
arXiv Detail & Related papers (2024-07-23T15:07:52Z) - Neural Harmonium: An Interpretable Deep Structure for Nonlinear Dynamic
System Identification with Application to Audio Processing [4.599180419117645]
Interpretability helps us understand a model's ability to generalize and reveal its limitations.
We introduce a causal interpretable deep structure for modeling dynamic systems.
Our proposed model makes use of the harmonic analysis by modeling the system in a time-frequency domain.
arXiv Detail & Related papers (2023-10-10T21:32:15Z) - Blind Acoustic Room Parameter Estimation Using Phase Features [4.473249957074495]
We propose utilizing novel phase-related features to extend recent approaches to blindly estimate the so-called "reverberation fingerprint" parameters.
The addition of these features is shown to outperform existing methods that rely solely on magnitude-based spectral features.
arXiv Detail & Related papers (2023-03-13T20:05:41Z) - Extending GCC-PHAT using Shift Equivariant Neural Networks [17.70159660438739]
Methods based on the generalized cross correlation with phase transform (GCC-PHAT) have been widely adopted for speaker localization.
We propose a novel approach to extending the GCC-PHAT, where the received signals are filtered using a shift equivariant neural network.
We show that our model consistently reduces the error of the GCC-PHAT in adverse environments, with guarantees of exact time delay recovery.
arXiv Detail & Related papers (2022-08-09T10:31:10Z) - Deep Impulse Responses: Estimating and Parameterizing Filters with Deep
Networks [76.830358429947]
Impulse response estimation in high noise and in-the-wild settings is a challenging problem.
We propose a novel framework for parameterizing and estimating impulse responses based on recent advances in neural representation learning.
arXiv Detail & Related papers (2022-02-07T18:57:23Z) - Rate Distortion Characteristic Modeling for Neural Image Compression [59.25700168404325]
End-to-end optimization capability offers neural image compression (NIC) superior lossy compression performance.
distinct models are required to be trained to reach different points in the rate-distortion (R-D) space.
We make efforts to formulate the essential mathematical functions to describe the R-D behavior of NIC using deep network and statistical modeling.
arXiv Detail & Related papers (2021-06-24T12:23:05Z) - Time-domain Speech Enhancement with Generative Adversarial Learning [53.74228907273269]
This paper proposes a new framework called Time-domain Speech Enhancement Generative Adversarial Network (TSEGAN)
TSEGAN is an extension of the generative adversarial network (GAN) in time-domain with metric evaluation to mitigate the scaling problem.
In addition, we provide a new method based on objective function mapping for the theoretical analysis of the performance of Metric GAN.
arXiv Detail & Related papers (2021-03-30T08:09:49Z) - Temporal-Spatial Neural Filter: Direction Informed End-to-End
Multi-channel Target Speech Separation [66.46123655365113]
Target speech separation refers to extracting the target speaker's speech from mixed signals.
Two main challenges are the complex acoustic environment and the real-time processing requirement.
We propose a temporal-spatial neural filter, which directly estimates the target speech waveform from multi-speaker mixture.
arXiv Detail & Related papers (2020-01-02T11:12:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.