Conditional Sound Generation Using Neural Discrete Time-Frequency
Representation Learning
- URL: http://arxiv.org/abs/2107.09998v1
- Date: Wed, 21 Jul 2021 10:31:28 GMT
- Title: Conditional Sound Generation Using Neural Discrete Time-Frequency
Representation Learning
- Authors: Xubo Liu, Turab Iqbal, Jinzheng Zhao, Qiushi Huang, Mark D. Plumbley,
Wenwu Wang
- Abstract summary: We propose to generate sounds conditioned on sound classes via neural discrete time-frequency representation learning.
This offers an advantage in modelling long-range dependencies and retaining local fine-grained structure within a sound clip.
- Score: 42.95813372611093
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep generative models have recently achieved impressive performance in
speech synthesis and music generation. However, compared to the generation of
those domain-specific sounds, the generation of general sounds (such as car
horn, dog barking, and gun shot) has received less attention, despite their
wide potential applications. In our previous work, sounds are generated in the
time domain using SampleRNN. However, it is difficult to capture long-range
dependencies within sound recordings using this method. In this work, we
propose to generate sounds conditioned on sound classes via neural discrete
time-frequency representation learning. This offers an advantage in modelling
long-range dependencies and retaining local fine-grained structure within a
sound clip. We evaluate our proposed approach on the UrbanSound8K dataset, as
compared to a SampleRNN baseline, with the performance metrics measuring the
quality and diversity of the generated sound samples. Experimental results show
that our proposed method offers significantly better performance in diversity
and comparable performance in quality, as compared to the baseline method.
Related papers
- DiffSED: Sound Event Detection with Denoising Diffusion [70.18051526555512]
We reformulate the SED problem by taking a generative learning perspective.
Specifically, we aim to generate sound temporal boundaries from noisy proposals in a denoising diffusion process.
During training, our model learns to reverse the noising process by converting noisy latent queries to the groundtruth versions.
arXiv Detail & Related papers (2023-08-14T17:29:41Z) - From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion [84.138804145918]
Deep generative models can generate high-fidelity audio conditioned on various types of representations.
These models are prone to generate audible artifacts when the conditioning is flawed or imperfect.
We propose a high-fidelity multi-band diffusion-based framework that generates any type of audio modality from low-bitrate discrete representations.
arXiv Detail & Related papers (2023-08-02T22:14:29Z) - Analysing the Impact of Audio Quality on the Use of Naturalistic
Long-Form Recordings for Infant-Directed Speech Research [62.997667081978825]
Modelling of early language acquisition aims to understand how infants bootstrap their language skills.
Recent developments have enabled the use of more naturalistic training data for computational models.
It is currently unclear how the sound quality could affect analyses and modelling experiments conducted on such data.
arXiv Detail & Related papers (2023-05-03T08:25:37Z) - Leveraging Pre-trained AudioLDM for Sound Generation: A Benchmark Study [33.10311742703679]
We make the first attempt to investigate the benefits of pre-training on sound generation with AudioLDM.
Our study demonstrates the advantages of the pre-trained AudioLDM, especially in data-scarcity scenarios.
We benchmark the sound generation task on various frequently-used datasets.
arXiv Detail & Related papers (2023-03-07T12:49:45Z) - Neural Waveshaping Synthesis [0.0]
We present a novel, lightweight, fully causal approach to neural audio synthesis.
The Neural Waveshaping Unit (NEWT) operates directly in the waveform domain.
It produces complex timbral evolutions by simple affine transformations of its input and output signals.
arXiv Detail & Related papers (2021-07-11T13:50:59Z) - CRASH: Raw Audio Score-based Generative Modeling for Controllable
High-resolution Drum Sound Synthesis [0.0]
We propose a novel score-base generative model for unconditional raw audio synthesis.
Our proposed method closes the gap with GAN-based methods on raw audio, while offering more flexible generation capabilities.
arXiv Detail & Related papers (2021-06-14T13:48:03Z) - Exploiting Attention-based Sequence-to-Sequence Architectures for Sound
Event Localization [113.19483349876668]
This paper proposes a novel approach to sound event localization by utilizing an attention-based sequence-to-sequence model.
It yields superior localization performance compared to state-of-the-art methods in both anechoic and reverberant conditions.
arXiv Detail & Related papers (2021-02-28T07:52:20Z) - Learning to Denoise Historical Music [30.165194151843835]
We propose an audio-to-audio neural network model that learns to denoise old music recordings.
The network is trained with both reconstruction and adversarial objectives on a noisy music dataset.
Our results show that the proposed method is effective in removing noise, while preserving the quality and details of the original music.
arXiv Detail & Related papers (2020-08-05T10:05:44Z) - Real Time Speech Enhancement in the Waveform Domain [99.02180506016721]
We present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU.
The proposed model is based on an encoder-decoder architecture with skip-connections.
It is capable of removing various kinds of background noise including stationary and non-stationary noises.
arXiv Detail & Related papers (2020-06-23T09:19:13Z) - COALA: Co-Aligned Autoencoders for Learning Semantically Enriched Audio
Representations [32.456824945999465]
We propose a method for learning audio representations, aligning the learned latent representations of audio and associated tags.
We evaluate the quality of our embedding model, measuring its performance as a feature extractor on three different tasks.
arXiv Detail & Related papers (2020-06-15T13:17:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.