Neural Synthesis of Footsteps Sound Effects with Generative Adversarial
Networks
- URL: http://arxiv.org/abs/2110.09605v1
- Date: Mon, 18 Oct 2021 20:04:46 GMT
- Title: Neural Synthesis of Footsteps Sound Effects with Generative Adversarial
Networks
- Authors: Marco Comunit\`a, Huy Phan, Joshua D. Reiss
- Abstract summary: We present a first attempt at adopting neural synthesis for footstep sound effects.
Our architectures reached realism scores as high as recorded samples, showing encouraging results.
- Score: 14.78990136075145
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Footsteps are among the most ubiquitous sound effects in multimedia
applications. There is substantial research into understanding the acoustic
features and developing synthesis models for footstep sound effects. In this
paper, we present a first attempt at adopting neural synthesis for this task.
We implemented two GAN-based architectures and compared the results with real
recordings as well as six traditional sound synthesis methods. Our
architectures reached realism scores as high as recorded samples, showing
encouraging results for the task at hand.
Related papers
- End-to-End Real-World Polyphonic Piano Audio-to-Score Transcription with Hierarchical Decoding [4.604877755214193]
Existing end-to-end piano A2S systems have been trained and evaluated with only synthetic data.
We propose a sequence-to-sequence (Seq2Seq) model with a hierarchical decoder that aligns with the hierarchical structure of musical scores.
We propose a two-stage training scheme, which involves pre-training the model using an expressive performance rendering system on synthetic audio, followed by fine-tuning the model using recordings of human performance.
arXiv Detail & Related papers (2024-05-22T10:52:04Z) - Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark [65.79402756995084]
Real Acoustic Fields (RAF) is a new dataset that captures real acoustic room data from multiple modalities.
RAF is the first dataset to provide densely captured room acoustic data.
arXiv Detail & Related papers (2024-03-27T17:59:56Z) - Make-A-Voice: Unified Voice Synthesis With Discrete Representation [77.3998611565557]
Make-A-Voice is a unified framework for synthesizing and manipulating voice signals from discrete representations.
We show that Make-A-Voice exhibits superior audio quality and style similarity compared with competitive baseline models.
arXiv Detail & Related papers (2023-05-30T17:59:26Z) - Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos [78.49864987061689]
Traditional methods of impact sound synthesis use physics simulation to obtain a set of physics parameters that could represent and synthesize the sound.
Existing video-driven deep learning-based approaches could only capture the weak correspondence between visual content and impact sounds.
We propose a physics-driven diffusion model that can synthesize high-fidelity impact sound for a silent video clip.
arXiv Detail & Related papers (2023-03-29T17:59:53Z) - Listen2Scene: Interactive material-aware binaural sound propagation for
reconstructed 3D scenes [69.03289331433874]
We present an end-to-end audio rendering approach (Listen2Scene) for virtual reality (VR) and augmented reality (AR) applications.
We propose a novel neural-network-based sound propagation method to generate acoustic effects for 3D models of real environments.
arXiv Detail & Related papers (2023-02-02T04:09:23Z) - Novel-View Acoustic Synthesis [140.1107768313269]
We introduce the novel-view acoustic synthesis (NVAS) task.
given the sight and sound observed at a source viewpoint, can we synthesize the sound of that scene from an unseen target viewpoint?
We propose a neural rendering approach: Visually-Guided Acoustic Synthesis (ViGAS) network that learns to synthesize the sound of an arbitrary point in space.
arXiv Detail & Related papers (2023-01-20T18:49:58Z) - Rigid-Body Sound Synthesis with Differentiable Modal Resonators [6.680437329908454]
We present a novel end-to-end framework for training a deep neural network to generate modal resonators for a given 2D shape and material.
We demonstrate our method on a dataset of synthetic objects, but train our model using an audio-domain objective.
arXiv Detail & Related papers (2022-10-27T10:34:38Z) - BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for
Binaural Audio Synthesis [129.86743102915986]
We formulate the synthesis process from a different perspective by decomposing the audio into a common part.
We propose BinauralGrad, a novel two-stage framework equipped with diffusion models to synthesize them respectively.
Experiment results show that BinauralGrad outperforms the existing baselines by a large margin in terms of both object and subject evaluation metrics.
arXiv Detail & Related papers (2022-05-30T02:09:26Z) - Learning Joint Articulatory-Acoustic Representations with Normalizing
Flows [7.183132975698293]
We find a joint latent representation between the articulatory and acoustic domain for vowel sounds via invertible neural network models.
Our approach achieves both articulatory-to-acoustic as well as acoustic-to-articulatory mapping, thereby demonstrating our success in achieving a joint encoding of both the domains.
arXiv Detail & Related papers (2020-05-16T04:34:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.