CONMOD: Controllable Neural Frame-based Modulation Effects
- URL: http://arxiv.org/abs/2406.13935v1
- Date: Thu, 20 Jun 2024 02:02:54 GMT
- Title: CONMOD: Controllable Neural Frame-based Modulation Effects
- Authors: Gyubin Lee, Hounsu Kim, Junwon Lee, Juhan Nam,
- Abstract summary: We introduce Controllable Neural Frame-based Modulation Effects (CONMOD), a single black-box model which emulates various LFO-driven effects in a frame-wise manner.
The model is capable of learning the continuous embedding space of two distinct phaser effects, enabling us to steer between effects and achieve creative outputs.
- Score: 6.132272910797383
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep learning models have seen widespread use in modelling LFO-driven audio effects, such as phaser and flanger. Although existing neural architectures exhibit high-quality emulation of individual effects, they do not possess the capability to manipulate the output via control parameters. To address this issue, we introduce Controllable Neural Frame-based Modulation Effects (CONMOD), a single black-box model which emulates various LFO-driven effects in a frame-wise manner, offering control over LFO frequency and feedback parameters. Additionally, the model is capable of learning the continuous embedding space of two distinct phaser effects, enabling us to steer between effects and achieve creative outputs. Our model outperforms previous work while possessing both controllability and universality, presenting opportunities to enhance creativity in modern LFO-driven audio effects.
Related papers
- Towards Neural Scaling Laws for Time Series Foundation Models [63.5211738245487]
We examine two common TSFM architectures, encoder-only and decoder-only Transformers, and investigate their scaling behavior on both ID and OOD data.
Our experiments reveal that the log-likelihood loss of TSFMs exhibits similar scaling behavior in both OOD and ID settings.
We provide practical guidelines for designing and scaling larger TSFMs with enhanced model capabilities.
arXiv Detail & Related papers (2024-10-16T08:23:39Z) - Towards Real-Time Neural Volumetric Rendering on Mobile Devices: A Measurement Study [12.392923990003753]
We take the first initiative to examine the state-of-the-art real-time NeRF rendering technique from a system perspective.
We first define the entire working pipeline of the NeRF serving system.
We then identify possible control knobs that are critical to the system from the communication, computation, and visual performance perspective.
arXiv Detail & Related papers (2024-06-23T10:33:26Z) - DITTO: Diffusion Inference-Time T-Optimization for Music Generation [49.90109850026932]
Diffusion Inference-Time T-Optimization (DITTO) is a frame-work for controlling pre-trained text-to-music diffusion models at inference-time.
We demonstrate a surprisingly wide-range of applications for music generation including inpainting, outpainting, and looping as well as intensity, melody, and musical structure control.
arXiv Detail & Related papers (2024-01-22T18:10:10Z) - Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual
Downstream Tasks [55.36987468073152]
This paper proposes a novel Dual-Guided Spatial-Channel-Temporal (DG-SCT) attention mechanism.
The DG-SCT module incorporates trainable cross-modal interaction layers into pre-trained audio-visual encoders.
Our proposed model achieves state-of-the-art results across multiple downstream tasks, including AVE, AVVP, AVS, and AVQA.
arXiv Detail & Related papers (2023-11-09T05:24:20Z) - Differentiable Grey-box Modelling of Phaser Effects using Frame-based
Spectral Processing [21.053861381437827]
This work presents a differentiable digital signal processing approach to modelling phaser effects.
The proposed model processes audio in short frames to implement a time-varying filter in the frequency domain.
We show that the model can be trained to emulate an analog reference device, while retaining interpretable and adjustable parameters.
arXiv Detail & Related papers (2023-06-02T07:53:41Z) - Modulation Extraction for LFO-driven Audio Effects [5.740770499256802]
We propose a framework capable of extracting arbitrary LFO signals from processed audio across multiple digital audio effects, parameter settings, and instrument configurations.
We show how coupling the extraction model with a simple processing network enables training of end-to-end black-box models of unseen analog or digital LFO-driven audio effects.
We make our code available and provide the trained audio effect models in a real-time VST plugin.
arXiv Detail & Related papers (2023-05-22T17:33:07Z) - Modelling black-box audio effects with time-varying feature modulation [13.378050193507907]
We show that scaling the width, depth, or dilation factor of existing architectures does not result in satisfactory performance when modelling audio effects such as fuzz and dynamic range compression.
We propose the integration of time-varying feature-wise linear modulation into existing temporal convolutional backbones.
We demonstrate that our approach more accurately captures long-range dependencies for a range of fuzz and compressor implementations across both time and frequency domain metrics.
arXiv Detail & Related papers (2022-11-01T14:41:57Z) - Effect of Batch Normalization on Noise Resistant Property of Deep
Learning Models [3.520496620951778]
There are concerns about the presence of analog noise which causes changes to the weight of the models, leading to performance degradation of deep learning model.
The effect of the popular batch normalization layer on the noise resistant ability of deep learning model is investigated in this work.
arXiv Detail & Related papers (2022-05-15T20:10:21Z) - Real-time Neural-MPC: Deep Learning Model Predictive Control for
Quadrotors and Agile Robotic Platforms [59.03426963238452]
We present Real-time Neural MPC, a framework to efficiently integrate large, complex neural network architectures as dynamics models within a model-predictive control pipeline.
We show the feasibility of our framework on real-world problems by reducing the positional tracking error by up to 82% when compared to state-of-the-art MPC approaches without neural network dynamics.
arXiv Detail & Related papers (2022-03-15T09:38:15Z) - MoEfication: Conditional Computation of Transformer Models for Efficient
Inference [66.56994436947441]
Transformer-based pre-trained language models can achieve superior performance on most NLP tasks due to large parameter capacity, but also lead to huge computation cost.
We explore to accelerate large-model inference by conditional computation based on the sparse activation phenomenon.
We propose to transform a large model into its mixture-of-experts (MoE) version with equal model size, namely MoEfication.
arXiv Detail & Related papers (2021-10-05T02:14:38Z) - Flexible Image Denoising with Multi-layer Conditional Feature Modulation [56.018132592622706]
We present a novel flexible image enoising network (CFMNet) by equipping an U-Net backbone with conditional feature modulation (CFM) modules.
In comparison to channel-wise shifting only in the first layer, CFMNet can make better use of noise level information by deploying multiple layers of CFM.
Our CFMNet is effective in exploiting noise level information for flexible non-blind denoising, and performs favorably against the existing deep image denoising methods in terms of both quantitative metrics and visual quality.
arXiv Detail & Related papers (2020-06-24T06:00:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.