Learning to Denoise Historical Music
- URL: http://arxiv.org/abs/2008.02027v2
- Date: Thu, 16 Jun 2022 11:18:28 GMT
- Title: Learning to Denoise Historical Music
- Authors: Yunpeng Li, Beat Gfeller, Marco Tagliasacchi, Dominik Roblek
- Abstract summary: We propose an audio-to-audio neural network model that learns to denoise old music recordings.
The network is trained with both reconstruction and adversarial objectives on a noisy music dataset.
Our results show that the proposed method is effective in removing noise, while preserving the quality and details of the original music.
- Score: 30.165194151843835
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose an audio-to-audio neural network model that learns to denoise old
music recordings. Our model internally converts its input into a time-frequency
representation by means of a short-time Fourier transform (STFT), and processes
the resulting complex spectrogram using a convolutional neural network. The
network is trained with both reconstruction and adversarial objectives on a
synthetic noisy music dataset, which is created by mixing clean music with real
noise samples extracted from quiet segments of old recordings. We evaluate our
method quantitatively on held-out test examples of the synthetic dataset, and
qualitatively by human rating on samples of actual historical recordings. Our
results show that the proposed method is effective in removing noise, while
preserving the quality and details of the original music.
Related papers
- Naturalistic Music Decoding from EEG Data via Latent Diffusion Models [14.882764251306094]
This study represents an initial foray into achieving general music reconstruction of high-quality using non-invasive EEG data.
We train our models on the public NMED-T dataset and perform quantitative evaluation proposing neural embedding-based metrics.
arXiv Detail & Related papers (2024-05-15T03:26:01Z) - Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark [65.79402756995084]
Real Acoustic Fields (RAF) is a new dataset that captures real acoustic room data from multiple modalities.
RAF is the first dataset to provide densely captured room acoustic data.
arXiv Detail & Related papers (2024-03-27T17:59:56Z) - Human Voice Pitch Estimation: A Convolutional Network with Auto-Labeled
and Synthetic Data [0.0]
We present a specialized convolutional neural network designed for pitch extraction.
Our approach combines synthetic data with auto-labeled acapella sung audio, creating a robust training environment.
This work paves the way for enhanced pitch extraction in both music and voice settings.
arXiv Detail & Related papers (2023-08-14T14:26:52Z) - Self-Supervised Visual Acoustic Matching [63.492168778869726]
Acoustic matching aims to re-synthesize an audio clip to sound as if it were recorded in a target acoustic environment.
We propose a self-supervised approach to visual acoustic matching where training samples include only the target scene image and audio.
Our approach jointly learns to disentangle room acoustics and re-synthesize audio into the target environment, via a conditional GAN framework and a novel metric.
arXiv Detail & Related papers (2023-07-27T17:59:59Z) - An investigation of the reconstruction capacity of stacked convolutional
autoencoders for log-mel-spectrograms [2.3204178451683264]
In audio processing applications, the generation of expressive sounds based on high-level representations demonstrates a high demand.
Modern algorithms, such as neural networks, have inspired the development of expressive synthesizers based on musical instrument compression.
This study investigates the use of stacked convolutional autoencoders for the compression of time-frequency audio representations for a variety of instruments for a single pitch.
arXiv Detail & Related papers (2023-01-18T17:19:04Z) - Geometry-Aware Multi-Task Learning for Binaural Audio Generation from
Video [94.42811508809994]
We propose an audio spatialization method that draws on visual information in videos to convert their monaural (single-channel) audio to audio.
Whereas existing approaches leverage visual features extracted directly from video frames, our approach explicitly disentangles the geometric cues present in the visual stream to guide the learning process.
arXiv Detail & Related papers (2021-11-21T19:26:45Z) - Removing Noise from Extracellular Neural Recordings Using Fully
Convolutional Denoising Autoencoders [62.997667081978825]
We propose a Fully Convolutional Denoising Autoencoder, which learns to produce a clean neuronal activity signal from a noisy multichannel input.
The experimental results on simulated data show that our proposed method can improve significantly the quality of noise-corrupted neural signals.
arXiv Detail & Related papers (2021-09-18T14:51:24Z) - Conditional Sound Generation Using Neural Discrete Time-Frequency
Representation Learning [42.95813372611093]
We propose to generate sounds conditioned on sound classes via neural discrete time-frequency representation learning.
This offers an advantage in modelling long-range dependencies and retaining local fine-grained structure within a sound clip.
arXiv Detail & Related papers (2021-07-21T10:31:28Z) - Deep Convolutional and Recurrent Networks for Polyphonic Instrument
Classification from Monophonic Raw Audio Waveforms [30.3491261167433]
Sound Event Detection and Audio Classification tasks are traditionally addressed through time-frequency representations of audio signals such as spectrograms.
Deep neural networks as efficient feature extractors has enabled the direct use of audio signals for classification purposes.
We attempt to recognize musical instruments in polyphonic audio by only feeding their raw waveforms into deep learning models.
arXiv Detail & Related papers (2021-02-13T13:44:46Z) - Hierarchical Timbre-Painting and Articulation Generation [92.59388372914265]
We present a fast and high-fidelity method for music generation, based on specified f0 and loudness.
The synthesized audio mimics the timbre and articulation of a target instrument.
arXiv Detail & Related papers (2020-08-30T05:27:39Z) - Score-informed Networks for Music Performance Assessment [64.12728872707446]
Deep neural network-based methods incorporating score information into MPA models have not yet been investigated.
We introduce three different models capable of score-informed performance assessment.
arXiv Detail & Related papers (2020-08-01T07:46:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.