Neural Granular Sound Synthesis
- URL: http://arxiv.org/abs/2008.01393v3
- Date: Sat, 3 Jul 2021 17:26:15 GMT
- Title: Neural Granular Sound Synthesis
- Authors: Adrien Bitton, Philippe Esling, Tatsuya Harada
- Abstract summary: Granular sound synthesis is a popular audio generation technique based on rearranging sequences of small waveform windows.
We show that generative neural networks can implement granular synthesis while alleviating most of its shortcomings.
- Score: 53.828476137089325
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Granular sound synthesis is a popular audio generation technique based on
rearranging sequences of small waveform windows. In order to control the
synthesis, all grains in a given corpus are analyzed through a set of acoustic
descriptors. This provides a representation reflecting some form of local
similarities across the grains. However, the quality of this grain space is
bound by that of the descriptors. Its traversal is not continuously invertible
to signal and does not render any structured temporality.
We demonstrate that generative neural networks can implement granular
synthesis while alleviating most of its shortcomings. We efficiently replace
its audio descriptor basis by a probabilistic latent space learned with a
Variational Auto-Encoder. In this setting the learned grain space is
invertible, meaning that we can continuously synthesize sound when traversing
its dimensions. It also implies that original grains are not stored for
synthesis. Another major advantage of our approach is to learn structured paths
inside this latent space by training a higher-level temporal embedding over
arranged grain sequences.
The model can be applied to many types of libraries, including pitched notes
or unpitched drums and environmental noises. We report experiments on the
common granular synthesis processes as well as novel ones such as conditional
sampling and morphing.
Related papers
- A Closer Look at Neural Codec Resynthesis: Bridging the Gap between Codec and Waveform Generation [65.05719674893999]
We study two different strategies based on token prediction and regression, and introduce a new method based on Schr"odinger Bridge.
We examine how different design choices affect machine and human perception.
arXiv Detail & Related papers (2024-10-29T18:29:39Z) - SynthesizRR: Generating Diverse Datasets with Retrieval Augmentation [55.2480439325792]
We study the synthesis of six datasets, covering topic classification, sentiment analysis, tone detection, and humor.
We find that SynthesizRR greatly improves lexical and semantic diversity, similarity to human-written text, and distillation performance.
arXiv Detail & Related papers (2024-05-16T12:22:41Z) - Neural Architectures Learning Fourier Transforms, Signal Processing and
Much More.... [1.2328446298523066]
We show how one can learn kernels from scratch for audio signal processing applications.
We find that the neural architecture not only learns sinusoidal kernel shapes but discovers all kinds of incredible signal-processing properties.
arXiv Detail & Related papers (2023-08-20T23:30:27Z) - Disentanglement in a GAN for Unconditional Speech Synthesis [28.998590651956153]
We propose AudioStyleGAN -- a generative adversarial network for unconditional speech synthesis tailored to learn a disentangled latent space.
ASGAN maps sampled noise to a disentangled latent vector which is then mapped to a sequence of audio features so that signal aliasing is suppressed at every layer.
We apply it on the small-vocabulary Google Speech Commands digits dataset, where it achieves state-of-the-art results in unconditional speech synthesis.
arXiv Detail & Related papers (2023-07-04T12:06:07Z) - PhaseAug: A Differentiable Augmentation for Speech Synthesis to Simulate
One-to-Many Mapping [0.3277163122167433]
We present PhaseAug, the first differentiable augmentation for speech synthesis that rotates the phase of each frequency bin to simulate one-to-many mapping.
arXiv Detail & Related papers (2022-11-08T23:37:05Z) - BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for
Binaural Audio Synthesis [129.86743102915986]
We formulate the synthesis process from a different perspective by decomposing the audio into a common part.
We propose BinauralGrad, a novel two-stage framework equipped with diffusion models to synthesize them respectively.
Experiment results show that BinauralGrad outperforms the existing baselines by a large margin in terms of both object and subject evaluation metrics.
arXiv Detail & Related papers (2022-05-30T02:09:26Z) - Incremental Text to Speech for Neural Sequence-to-Sequence Models using
Reinforcement Learning [60.20205278845412]
Modern approaches to text to speech require the entire input character sequence to be processed before any audio is synthesised.
This latency limits the suitability of such models for time-sensitive tasks like simultaneous interpretation.
We propose a reinforcement learning based framework to train an agent to make this decision.
arXiv Detail & Related papers (2020-08-07T11:48:05Z) - Timbre latent space: exploration and creative aspects [1.3764085113103222]
Recent studies show the ability of unsupervised models to learn invertible audio representations using Auto-Encoders.
New possibilities for timbre manipulations are enabled with generative neural networks.
arXiv Detail & Related papers (2020-08-04T07:08:04Z) - Vector-Quantized Timbre Representation [53.828476137089325]
This paper targets a more flexible synthesis of an individual timbre by learning an approximate decomposition of its spectral properties with a set of generative features.
We introduce an auto-encoder with a discrete latent space that is disentangled from loudness in order to learn a quantized representation of a given timbre distribution.
We detail results for translating audio between orchestral instruments and singing voice, as well as transfers from vocal imitations to instruments.
arXiv Detail & Related papers (2020-07-13T12:35:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.