Modeling Time-Variant Responses of Optical Compressors with Selective State Space Models
- URL: http://arxiv.org/abs/2408.12549v2
- Date: Thu, 29 Aug 2024 09:46:54 GMT
- Title: Modeling Time-Variant Responses of Optical Compressors with Selective State Space Models
- Authors: Riccardo Simionato, Stefano Fasciani,
- Abstract summary: This paper presents a method for modeling optical dynamic range compressors using deep neural networks with Selective State Space models.
It features a refined technique integrating Feature-wise Linear Modulation and Gated Linear Units to adjust the network dynamically.
The proposed architecture is well-suited for low-latency and real-time applications, crucial in live audio processing.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: This paper presents a method for modeling optical dynamic range compressors using deep neural networks with Selective State Space models. The proposed approach surpasses previous methods based on recurrent layers by employing a Selective State Space block to encode the input audio. It features a refined technique integrating Feature-wise Linear Modulation and Gated Linear Units to adjust the network dynamically, conditioning the compression's attack and release phases according to external parameters. The proposed architecture is well-suited for low-latency and real-time applications, crucial in live audio processing. The method has been validated on the analog optical compressors TubeTech CL 1B and Teletronix LA-2A, which possess distinct characteristics. Evaluation is performed using quantitative metrics and subjective listening tests, comparing the proposed method with other state-of-the-art models. Results show that our black-box modeling methods outperform all others, achieving accurate emulation of the compression process for both seen and unseen settings during training. We further show a correlation between this accuracy and the sampling density of the control parameters in the dataset and identify settings with fast attack and slow release as the most challenging to emulate.
Related papers
- Model and Deep learning based Dynamic Range Compression Inversion [12.002024727237837]
Inverting DRC can help to restore the original dynamics to produce new mixes and/or to improve the overall quality of the audio signal.
We propose a model-based approach with neural networks for DRC inversion.
Our results show the effectiveness and robustness of the proposed method in comparison to several state-of-the-art methods.
arXiv Detail & Related papers (2024-11-07T00:33:07Z) - Language Models as Zero-shot Lossless Gradient Compressors: Towards
General Neural Parameter Prior Models [66.1595537904019]
Large language models (LLMs) can act as gradient priors in a zero-shot setting.
We introduce LM-GC, a novel method that integrates LLMs with arithmetic coding.
arXiv Detail & Related papers (2024-09-26T13:38:33Z) - Model-Based Qubit Noise Spectroscopy [0.0]
We derive model-based QNS approaches using inspiration from classical signal processing.
We show, through both simulation and experimental data, how these model-based QNS approaches maintain the statistical and computational benefits of their classical counterparts.
arXiv Detail & Related papers (2024-05-20T09:30:38Z) - Comparative Study of State-based Neural Networks for Virtual Analog Audio Effects Modeling [0.0]
This article explores the application of machine learning advancements for virtual analog modeling.
We compare State-Space models and Linear Recurrent Units against the more common Long Short-Term Memory networks.
arXiv Detail & Related papers (2024-05-07T08:47:40Z) - Online Variational Sequential Monte Carlo [49.97673761305336]
We build upon the variational sequential Monte Carlo (VSMC) method, which provides computationally efficient and accurate model parameter estimation and Bayesian latent-state inference.
Online VSMC is capable of performing efficiently, entirely on-the-fly, both parameter estimation and particle proposal adaptation.
arXiv Detail & Related papers (2023-12-19T21:45:38Z) - Uncertainty Guided Adaptive Warping for Robust and Efficient Stereo
Matching [77.133400999703]
Correlation based stereo matching has achieved outstanding performance.
Current methods with a fixed model do not work uniformly well across various datasets.
This paper proposes a new perspective to dynamically calculate correlation for robust stereo matching.
arXiv Detail & Related papers (2023-07-26T09:47:37Z) - Boosting Fast and High-Quality Speech Synthesis with Linear Diffusion [85.54515118077825]
This paper proposes a linear diffusion model (LinDiff) based on an ordinary differential equation to simultaneously reach fast inference and high sample quality.
To reduce computational complexity, LinDiff employs a patch-based processing approach that partitions the input signal into small patches.
Our model can synthesize speech of a quality comparable to that of autoregressive models with faster synthesis speed.
arXiv Detail & Related papers (2023-06-09T07:02:43Z) - One-Dimensional Deep Image Prior for Curve Fitting of S-Parameters from
Electromagnetic Solvers [57.441926088870325]
Deep Image Prior (DIP) is a technique that optimized the weights of a randomly-d convolutional neural network to fit a signal from noisy or under-determined measurements.
Relative to publicly available implementations of Vector Fitting (VF), our method shows superior performance on nearly all test examples.
arXiv Detail & Related papers (2023-06-06T20:28:37Z) - Gradient-free optimization of chaotic acoustics with reservoir computing [6.345523830122166]
We develop a versatile optimization method, which finds the design parameters that minimize time-averaged acoustic cost functionals.
The method is gradient-free, model-informed, and data-driven with reservoir computing based on echo state networks.
arXiv Detail & Related papers (2021-06-17T19:49:45Z) - Real-Time Model Calibration with Deep Reinforcement Learning [4.707841918805165]
We propose a novel framework for inference of model parameters based on reinforcement learning.
The proposed methodology is demonstrated and evaluated on two model-based diagnostics test cases.
arXiv Detail & Related papers (2020-06-07T00:11:42Z) - Temporal-Spatial Neural Filter: Direction Informed End-to-End
Multi-channel Target Speech Separation [66.46123655365113]
Target speech separation refers to extracting the target speaker's speech from mixed signals.
Two main challenges are the complex acoustic environment and the real-time processing requirement.
We propose a temporal-spatial neural filter, which directly estimates the target speech waveform from multi-speaker mixture.
arXiv Detail & Related papers (2020-01-02T11:12:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.