Differentiable All-pole Filters for Time-varying Audio Systems
- URL: http://arxiv.org/abs/2404.07970v4
- Date: Sat, 19 Oct 2024 01:25:13 GMT
- Title: Differentiable All-pole Filters for Time-varying Audio Systems
- Authors: Chin-Yun Yu, Christopher Mitcheltree, Alistair Carson, Stefan Bilbao, Joshua D. Reiss, György Fazekas,
- Abstract summary: We re-express a time-varying all-pole filter to backpropagate the gradient through itself.
This implementation can be employed within audio systems containing filters with poles for efficient gradient evaluation.
We demonstrate its training efficiency and expressive capabilities for modelling real-world dynamic audio systems on a phaser, time-varying subtractive synthesiser, and compressor.
- Score: 9.089836388818808
- License:
- Abstract: Infinite impulse response filters are an essential building block of many time-varying audio systems, such as audio effects and synthesisers. However, their recursive structure impedes end-to-end training of these systems using automatic differentiation. Although non-recursive filter approximations like frequency sampling and frame-based processing have been proposed and widely used in previous works, they cannot accurately reflect the gradient of the original system. We alleviate this difficulty by re-expressing a time-varying all-pole filter to backpropagate the gradients through itself, so the filter implementation is not bound to the technical limitations of automatic differentiation frameworks. This implementation can be employed within audio systems containing filters with poles for efficient gradient evaluation. We demonstrate its training efficiency and expressive capabilities for modelling real-world dynamic audio systems on a phaser, time-varying subtractive synthesiser, and compressor. We make our code and audio samples available and provide the trained audio effect and synth models in a VST plugin at https://diffapf.github.io/web/.
Related papers
- Robust AI-Synthesized Speech Detection Using Feature Decomposition Learning and Synthesizer Feature Augmentation [52.0893266767733]
We propose a robust deepfake speech detection method that employs feature decomposition to learn synthesizer-independent content features.
To enhance the model's robustness to different synthesizer characteristics, we propose a synthesizer feature augmentation strategy.
arXiv Detail & Related papers (2024-11-14T03:57:21Z) - FilterNet: Harnessing Frequency Filters for Time Series Forecasting [34.83702192033196]
FilterNet is built upon our proposed learnable frequency filters to extract key informative temporal patterns by selectively passing or attenuating certain components of time series signals.
equipped with the two filters, FilterNet can approximately surrogate the linear and attention mappings widely adopted in time series literature.
arXiv Detail & Related papers (2024-11-03T16:20:41Z) - Embedding a Differentiable Mel-cepstral Synthesis Filter to a Neural
Speech Synthesis System [23.96111084078404]
This paper integrates a classic mel-cepstral synthesis filter into a modern neural speech synthesis system.
We show that the proposed system improves speech quality from a baseline system maintaining controllability.
arXiv Detail & Related papers (2022-11-21T07:35:21Z) - High Fidelity Neural Audio Compression [92.4812002532009]
We introduce a state-of-the-art real-time, high-fidelity, audio leveraging neural networks.
It consists in a streaming encoder-decoder architecture with quantized latent space trained in an end-to-end fashion.
We simplify and speed-up the training by using a single multiscale spectrogram adversary.
arXiv Detail & Related papers (2022-10-24T17:52:02Z) - Fully Automated End-to-End Fake Audio Detection [57.78459588263812]
This paper proposes a fully automated end-toend fake audio detection method.
We first use wav2vec pre-trained model to obtain a high-level representation of the speech.
For the network structure, we use a modified version of the differentiable architecture search (DARTS) named light-DARTS.
arXiv Detail & Related papers (2022-08-20T06:46:55Z) - Streamable Neural Audio Synthesis With Non-Causal Convolutions [1.8275108630751844]
We introduce a new method allowing to produce non-causal streaming models.
This allows to make any convolutional model compatible with real-time buffer-based processing.
We show how our method can be adapted to fit complex architectures with parallel branches.
arXiv Detail & Related papers (2022-04-14T16:00:32Z) - SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with
Adaptive Noise Spectral Shaping [51.698273019061645]
SpecGrad adapts the diffusion noise so that its time-varying spectral envelope becomes close to the conditioning log-mel spectrogram.
It is processed in the time-frequency domain to keep the computational cost almost the same as the conventional DDPM-based neural vocoders.
arXiv Detail & Related papers (2022-03-31T02:08:27Z) - Learning Sparse Analytic Filters for Piano Transcription [21.352141245632247]
Filterbank learning has become an increasingly popular strategy for various audio-related machine learning tasks.
In this work, several variations of a filterbank learning module are investigated for piano transcription.
arXiv Detail & Related papers (2021-08-23T19:41:11Z) - Differentiable Signal Processing With Black-Box Audio Effects [44.93154498647659]
We present a data-driven approach to automate audio signal processing by incorporating stateful third-party, audio effects as layers within a deep neural network.
We show that our approach can yield results comparable to a specialized, state-of-the-art commercial solution for music mastering.
arXiv Detail & Related papers (2021-05-11T02:20:22Z) - When is Particle Filtering Efficient for Planning in Partially Observed
Linear Dynamical Systems? [60.703816720093016]
This paper initiates a study on the efficiency of particle filtering for sequential planning.
We are able to bound the number of particles needed so that the long-run reward of the policy based on particle filtering is close to that based on exact inference.
We believe this technique can be useful in other sequential decision-making problems.
arXiv Detail & Related papers (2020-06-10T17:43:43Z) - Temporal-Spatial Neural Filter: Direction Informed End-to-End
Multi-channel Target Speech Separation [66.46123655365113]
Target speech separation refers to extracting the target speaker's speech from mixed signals.
Two main challenges are the complex acoustic environment and the real-time processing requirement.
We propose a temporal-spatial neural filter, which directly estimates the target speech waveform from multi-speaker mixture.
arXiv Detail & Related papers (2020-01-02T11:12:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.