Towards Lightweight Controllable Audio Synthesis with Conditional
Implicit Neural Representations
- URL: http://arxiv.org/abs/2111.08462v1
- Date: Sun, 14 Nov 2021 13:36:18 GMT
- Title: Towards Lightweight Controllable Audio Synthesis with Conditional
Implicit Neural Representations
- Authors: Jan Zuiderveld, Marco Federici, Erik J. Bekkers
- Abstract summary: Implicit neural representations (INRs) are neural networks used to approximate low-dimensional functions.
In this work we shed light on the potential of Conditional Implicit Neural Representations (CINRs) as lightweight backbones in generative frameworks for audio synthesis.
- Score: 10.484851004093919
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The high temporal resolution of audio and our perceptual sensitivity to small
irregularities in waveforms make synthesizing at high sampling rates a complex
and computationally intensive task, prohibiting real-time, controllable
synthesis within many approaches. In this work we aim to shed light on the
potential of Conditional Implicit Neural Representations (CINRs) as lightweight
backbones in generative frameworks for audio synthesis.
Implicit neural representations (INRs) are neural networks used to
approximate low-dimensional functions, trained to represent a single geometric
object by mapping input coordinates to structural information at input
locations. In contrast with other neural methods for representing geometric
objects, the memory required to parameterize the object is independent of
resolution, and only scales with its complexity. A corollary of this is that
INRs have infinite resolution, as they can be sampled at arbitrary resolutions.
To apply the concept of INRs in the generative domain we frame generative
modelling as learning a distribution of continuous functions. This can be
achieved by introducing conditioning methods to INRs.
Our experiments show that Periodic Conditional INRs (PCINRs) learn faster and
generally produce quantitatively better audio reconstructions than Transposed
Convolutional Neural Networks with equal parameter counts. However, their
performance is very sensitive to activation scaling hyperparameters. When
learning to represent more uniform sets, PCINRs tend to introduce artificial
high-frequency components in reconstructions. We validate this noise can be
minimized by applying standard weight regularization during training or
decreasing the compositional depth of PCINRs, and suggest directions for future
research.
Related papers
- Unifying Subsampling Pattern Variations for Compressed Sensing MRI with Neural Operators [72.79532467687427]
Compressed Sensing MRI reconstructs images of the body's internal anatomy from undersampled and compressed measurements.
Deep neural networks have shown great potential for reconstructing high-quality images from highly undersampled measurements.
We propose a unified model that is robust to different subsampling patterns and image resolutions in CS-MRI.
arXiv Detail & Related papers (2024-10-05T20:03:57Z) - Towards a Sampling Theory for Implicit Neural Representations [0.3222802562733786]
Implicit neural representations (INRs) have emerged as a powerful tool for solving inverse problems in computer and computational imaging.
We show how to recover images from a hidden-layer INR using a generalized form of weight decay regularization.
We empirically assess the probability of achieving exact recovery images realized by low-width single-layer INRs, and illustrate the performance of INR on super-resolution recovery of more realistic continuous domain phantom images.
arXiv Detail & Related papers (2024-05-28T17:53:47Z) - INCODE: Implicit Neural Conditioning with Prior Knowledge Embeddings [4.639495398851869]
Implicit Neural Representations (INRs) have revolutionized signal representation by leveraging neural networks to provide continuous and smooth representations of complex data.
We introduce INCODE, a novel approach that enhances the control of the sinusoidal-based activation function in INRs using deep prior knowledge.
Our approach not only excels in representation, but also extends its prowess to tackle complex tasks such as audio, image, and 3D shape reconstructions.
arXiv Detail & Related papers (2023-10-28T23:16:49Z) - FFEINR: Flow Feature-Enhanced Implicit Neural Representation for
Spatio-temporal Super-Resolution [4.577685231084759]
This paper proposes a Feature-Enhanced Neural Implicit Representation (FFEINR) for super-resolution of flow field data.
It can take full advantage of the implicit neural representation in terms of model structure and sampling resolution.
The training process of FFEINR is facilitated by introducing feature enhancements for the input layer.
arXiv Detail & Related papers (2023-08-24T02:28:18Z) - Degradation-Noise-Aware Deep Unfolding Transformer for Hyperspectral
Image Denoising [9.119226249676501]
Hyperspectral images (HSIs) are often quite noisy because of narrow band spectral filtering.
To reduce the noise in HSI data cubes, both model-driven and learning-based denoising algorithms have been proposed.
This paper proposes a Degradation-Noise-Aware Unfolding Network (DNA-Net) that addresses these issues.
arXiv Detail & Related papers (2023-05-06T13:28:20Z) - Modality-Agnostic Variational Compression of Implicit Neural
Representations [96.35492043867104]
We introduce a modality-agnostic neural compression algorithm based on a functional view of data and parameterised as an Implicit Neural Representation (INR)
Bridging the gap between latent coding and sparsity, we obtain compact latent representations non-linearly mapped to a soft gating mechanism.
After obtaining a dataset of such latent representations, we directly optimise the rate/distortion trade-off in a modality-agnostic space using neural compression.
arXiv Detail & Related papers (2023-01-23T15:22:42Z) - Versatile Neural Processes for Learning Implicit Neural Representations [57.090658265140384]
We propose Versatile Neural Processes (VNP), which largely increases the capability of approximating functions.
Specifically, we introduce a bottleneck encoder that produces fewer and informative context tokens, relieving the high computational cost.
We demonstrate the effectiveness of the proposed VNP on a variety of tasks involving 1D, 2D and 3D signals.
arXiv Detail & Related papers (2023-01-21T04:08:46Z) - Signal Processing for Implicit Neural Representations [80.38097216996164]
Implicit Neural Representations (INRs) encode continuous multi-media data via multi-layer perceptrons.
Existing works manipulate such continuous representations via processing on their discretized instance.
We propose an implicit neural signal processing network, dubbed INSP-Net, via differential operators on INR.
arXiv Detail & Related papers (2022-10-17T06:29:07Z) - UNeRF: Time and Memory Conscious U-Shaped Network for Training Neural
Radiance Fields [16.826691448973367]
Neural Radiance Fields (NeRFs) increase reconstruction detail for novel view synthesis and scene reconstruction.
However, the increased resolution and model-free nature of such neural fields come at the cost of high training times and excessive memory requirements.
We propose a method to exploit the redundancy of NeRF's sample-based computations by partially sharing evaluations across neighboring sample points.
arXiv Detail & Related papers (2022-06-23T19:57:07Z) - InfoNeRF: Ray Entropy Minimization for Few-Shot Neural Volume Rendering [55.70938412352287]
We present an information-theoretic regularization technique for few-shot novel view synthesis based on neural implicit representation.
The proposed approach minimizes potential reconstruction inconsistency that happens due to insufficient viewpoints.
We achieve consistently improved performance compared to existing neural view synthesis methods by large margins on multiple standard benchmarks.
arXiv Detail & Related papers (2021-12-31T11:56:01Z) - Learning Frequency-aware Dynamic Network for Efficient Super-Resolution [56.98668484450857]
This paper explores a novel frequency-aware dynamic network for dividing the input into multiple parts according to its coefficients in the discrete cosine transform (DCT) domain.
In practice, the high-frequency part will be processed using expensive operations and the lower-frequency part is assigned with cheap operations to relieve the computation burden.
Experiments conducted on benchmark SISR models and datasets show that the frequency-aware dynamic network can be employed for various SISR neural architectures.
arXiv Detail & Related papers (2021-03-15T12:54:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.