Related papers: Comparative Study of State-based Neural Networks for Virtual Analog Audio Effects Modeling

Comparative Study of State-based Neural Networks for Virtual Analog Audio Effects Modeling

URL: http://arxiv.org/abs/2405.04124v5
Date: Thu, 29 Aug 2024 09:44:59 GMT
Title: Comparative Study of State-based Neural Networks for Virtual Analog Audio Effects Modeling
Authors: Riccardo Simionato, Stefano Fasciani,
Abstract summary: This article explores the application of machine learning advancements for virtual analog modeling. We compare State-Space models and Linear Recurrent Units against the more common Long Short-Term Memory networks.
Score: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Analog electronic circuits are at the core of an important category of musical devices, which includes a broad range of sound synthesizers and audio effects. The development of software that simulates analog musical devices, known as virtual analog modeling, is a significant sub-field in audio signal processing. Artificial neural networks are a promising technique for virtual analog modeling. While neural approaches have successfully accurately modeled distortion circuits, they require architectural improvements that account for parameter conditioning and low-latency response. This article explores the application of recent machine learning advancements for virtual analog modeling. In particular, we compare State-Space models and Linear Recurrent Units against the more common Long Short-Term Memory networks. Our comparative study uses these black-box neural modeling techniques with various audio effects. We evaluate the performance and limitations of these models using multiple metrics, providing insights for future research and development. Our metrics aim to assess the models' ability to accurately replicate energy envelopes and frequency contents, with a particular focus on transients in the audio signal. To incorporate control parameters into the models, we employ the Feature-wise Linear Modulation method. Long Short-Term Memory networks exhibit better accuracy in emulating distortions and equalizers, while the State-Space model, followed by Long Short-Term Memory networks when integrated in an encoder-decoder structure, and Linear Recurrent Unit outperforms others in emulating saturation and compression. When considering long time-variant characteristics, the State-Space model demonstrates the greatest capability to track history. Long Short-Term Memory networks tend to introduce audio artifacts.

Related papers

FreSca: Scaling in Frequency Space Enhances Diffusion Models [55.75504192166779]
This paper explores frequency-based control within latent diffusion models.<n>We introduce FreSca, a novel framework that decomposes noise difference into low- and high-frequency components.<n>FreSca operates without any model retraining or architectural change, offering model- and task-agnostic control.
arXiv Detail & Related papers (2025-04-02T22:03:11Z)
Task-Oriented Real-time Visual Inference for IoVT Systems: A Co-design Framework of Neural Networks and Edge Deployment [61.20689382879937]
Task-oriented edge computing addresses this by shifting data analysis to the edge. Existing methods struggle to balance high model performance with low resource consumption. We propose a novel co-design framework to optimize neural network architecture.
arXiv Detail & Related papers (2024-10-29T19:02:54Z)
ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation [83.62931466231898]
This paper presents ARLON, a framework that boosts diffusion Transformers with autoregressive models for long video generation. A latent Vector Quantized Variational Autoencoder (VQ-VAE) compresses the input latent space of the DiT model into compact visual tokens. An adaptive norm-based semantic injection module integrates the coarse discrete visual units from the AR model into the DiT model.
arXiv Detail & Related papers (2024-10-27T16:28:28Z)
A Realistic Simulation Framework for Analog/Digital Neuromorphic Architectures [73.65190161312555]
ARCANA is a spiking neural network simulator designed to account for the properties of mixed-signal neuromorphic circuits. We show how the results obtained provide a reliable estimate of the behavior of the spiking neural network trained in software.
arXiv Detail & Related papers (2024-09-23T11:16:46Z)
Evaluating Neural Networks Architectures for Spring Reverb Modelling [0.21847754147782888]
The electromechanical functioning of the spring reverb makes it a nonlinear system that is difficult to fully emulate in the digital domain. We compare five different neural network architectures, including convolutional and recurrent models, to assess their effectiveness in replicating the characteristics of this audio effect. This paper specifically focuses on neural audio architectures that offer parametric control, aiming to advance the boundaries of current black-box modelling techniques in the domain of spring reverberation.
arXiv Detail & Related papers (2024-09-08T02:37:42Z)
Modeling Time-Variant Responses of Optical Compressors with Selective State Space Models [0.0]
This paper presents a method for modeling optical dynamic range compressors using deep neural networks with Selective State Space models. It features a refined technique integrating Feature-wise Linear Modulation and Gated Linear Units to adjust the network dynamically. The proposed architecture is well-suited for low-latency and real-time applications, crucial in live audio processing.
arXiv Detail & Related papers (2024-08-22T17:03:08Z)
Differentiable Modal Synthesis for Physical Modeling of Planar String Sound and Motion Simulation [17.03776191787701]
We introduce a novel model for simulating motion properties of nonlinear strings. We integrate modal synthesis and spectral modeling within physical network framework. Empirical evaluations demonstrate that the architecture achieves superior accuracy in string motion simulation.
arXiv Detail & Related papers (2024-07-07T23:36:51Z)
Understanding Self-attention Mechanism via Dynamical System Perspective [58.024376086269015]
Self-attention mechanism (SAM) is widely used in various fields of artificial intelligence. We show that intrinsic stiffness phenomenon (SP) in the high-precision solution of ordinary differential equations (ODEs) also widely exists in high-performance neural networks (NN) We show that the SAM is also a stiffness-aware step size adaptor that can enhance the model's representational ability to measure intrinsic SP.
arXiv Detail & Related papers (2023-08-19T08:17:41Z)
From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion [84.138804145918]
Deep generative models can generate high-fidelity audio conditioned on various types of representations. These models are prone to generate audible artifacts when the conditioning is flawed or imperfect. We propose a high-fidelity multi-band diffusion-based framework that generates any type of audio modality from low-bitrate discrete representations.
arXiv Detail & Related papers (2023-08-02T22:14:29Z)
Tunable Convolutions with Parametric Multi-Loss Optimization [5.658123802733283]
Behavior of neural networks is irremediably determined by the specific loss and data used during training. It is often desirable to tune the model at inference time based on external factors such as preferences of the user or dynamic characteristics of the data. This is especially important to balance the perception-distortion trade-off of ill-posed image-to-image translation tasks.
arXiv Detail & Related papers (2023-04-03T11:36:10Z)
High Fidelity Neural Audio Compression [92.4812002532009]
We introduce a state-of-the-art real-time, high-fidelity, audio leveraging neural networks. It consists in a streaming encoder-decoder architecture with quantized latent space trained in an end-to-end fashion. We simplify and speed-up the training by using a single multiscale spectrogram adversary.
arXiv Detail & Related papers (2022-10-24T17:52:02Z)
End-to-End Learning of Hybrid Inverse Dynamics Models for Precise and Compliant Impedance Control [16.88250694156719]
We present a novel hybrid model formulation that enables us to identify fully physically consistent inertial parameters of a rigid body dynamics model. We compare our approach against state-of-the-art inverse dynamics models on a 7 degree of freedom manipulator.
arXiv Detail & Related papers (2022-05-27T07:39:28Z)
An advanced spatio-temporal convolutional recurrent neural network for storm surge predictions [73.4962254843935]
We study the capability of artificial neural network models to emulate storm surge based on the storm track/size/intensity history. This study presents a neural network model that can predict storm surge, informed by a database of synthetic storm simulations.
arXiv Detail & Related papers (2022-04-18T23:42:18Z)
Real-time Neural-MPC: Deep Learning Model Predictive Control for Quadrotors and Agile Robotic Platforms [59.03426963238452]
We present Real-time Neural MPC, a framework to efficiently integrate large, complex neural network architectures as dynamics models within a model-predictive control pipeline. We show the feasibility of our framework on real-world problems by reducing the positional tracking error by up to 82% when compared to state-of-the-art MPC approaches without neural network dynamics.
arXiv Detail & Related papers (2022-03-15T09:38:15Z)
RAVE: A variational autoencoder for fast and high-quality neural audio synthesis [2.28438857884398]
We introduce a Realtime Audio Variational autoEncoder (RAVE) allowing both fast and high-quality audio waveform synthesis. We show that our model is the first able to generate 48kHz audio signals, while simultaneously running 20 times faster than real-time on a standard laptop CPU.
arXiv Detail & Related papers (2021-11-09T09:07:30Z)
Conditionally Parameterized, Discretization-Aware Neural Networks for Mesh-Based Modeling of Physical Systems [0.0]
We generalize the idea of conditional parametrization -- using trainable functions of input parameters. We show that conditionally parameterized networks provide superior performance compared to their traditional counterparts. A network architecture named CP-GNet is also proposed as the first deep learning model capable of reacting standalone prediction of flows on meshes.
arXiv Detail & Related papers (2021-09-15T20:21:13Z)
Rate Distortion Characteristic Modeling for Neural Image Compression [59.25700168404325]
End-to-end optimization capability offers neural image compression (NIC) superior lossy compression performance. distinct models are required to be trained to reach different points in the rate-distortion (R-D) space. We make efforts to formulate the essential mathematical functions to describe the R-D behavior of NIC using deep network and statistical modeling.
arXiv Detail & Related papers (2021-06-24T12:23:05Z)
Action-Conditional Recurrent Kalman Networks For Forward and Inverse Dynamics Learning [17.80270555749689]
Estimating accurate forward and inverse dynamics models is a crucial component of model-based control for robots. We present two architectures for forward model learning and one for inverse model learning. Both architectures significantly outperform exist-ing model learning frameworks as well as analytical models in terms of prediction performance.
arXiv Detail & Related papers (2020-10-20T11:28:25Z)
Exploring Quality and Generalizability in Parameterized Neural Audio Effects [0.0]
Deep neural networks have shown promise for music audio signal processing applications. Results to date have tended to be constrained by low sample rates, noise, narrow domains of signal types, and/or lack of parameterized controls. This work expands on prior research published on modeling nonlinear time-dependent signal processing effects.
arXiv Detail & Related papers (2020-06-10T00:52:08Z)
VaPar Synth -- A Variational Parametric Model for Audio Synthesis [78.3405844354125]
We present VaPar Synth - a Variational Parametric Synthesizer which utilizes a conditional variational autoencoder (CVAE) trained on a suitable parametric representation. We demonstrate our proposed model's capabilities via the reconstruction and generation of instrumental tones with flexible control over their pitch.
arXiv Detail & Related papers (2020-03-30T16:05:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.