The Rarity of Musical Audio Signals Within the Space of Possible Audio Generation
- URL: http://arxiv.org/abs/2405.15103v1
- Date: Thu, 23 May 2024 23:25:46 GMT
- Title: The Rarity of Musical Audio Signals Within the Space of Possible Audio Generation
- Authors: Nick Collins,
- Abstract summary: A white noise signal can access any possible configuration of values, though statistically over many samples tends to a uniform spectral distribution.
The probability that white noise generates a music-like signal over different durations is analyzed.
The applicability of this study is not just to show that music has a precious rarity value, but that examination of the size of music relative to the overall size of audio signal space provides information to inform new generations of algorithmic music system.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A white noise signal can access any possible configuration of values, though statistically over many samples tends to a uniform spectral distribution, and is highly unlikely to produce intelligible sound. But how unlikely? The probability that white noise generates a music-like signal over different durations is analyzed, based on some necessary features observed in real music audio signals such as mostly proximate movement and zero crossing rate. Given the mathematical results, the rarity of music as a signal is considered overall. The applicability of this study is not just to show that music has a precious rarity value, but that examination of the size of music relative to the overall size of audio signal space provides information to inform new generations of algorithmic music system (which are now often founded on audio signal generation directly, and may relate to white noise via such machine learning processes as diffusion). Estimated upper bounds on the rarity of music to the size of various physical and musical spaces are compared, to better understand the magnitude of the results (pun intended). Underlying the research are the questions `how much music is still out there?' and `how much music could a machine learning process actually reach?'.
Related papers
- MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations.
We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music.
Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z) - A Dataset and Baselines for Measuring and Predicting the Music Piece Memorability [16.18336216092687]
We focus on measuring and predicting music memorability.
We train baselines to predict and analyze music memorability.
We demonstrate that while there is room for improvement, predicting music memorability with limited data is possible.
arXiv Detail & Related papers (2024-05-21T14:57:04Z) - A Novel Audio Representation for Music Genre Identification in MIR [3.203495505471781]
For Music Information Retrieval downstream tasks, the most common audio representation is time-frequency-based, such as Mel spectrograms.
This study explores the possibilities of a new form of audio representation one of the most usual MIR downstream tasks.
A novel audio representation was created for the innovative generative music model i.e. Jukebox.
The effectiveness of Jukebox's audio representation is compared to Mel spectrograms using a dataset that is almost equivalent to State-of-the-Art (SOTA) and an almost same transformer design.
arXiv Detail & Related papers (2024-04-01T11:40:09Z) - MARBLE: Music Audio Representation Benchmark for Universal Evaluation [79.25065218663458]
We introduce the Music Audio Representation Benchmark for universaL Evaluation, termed MARBLE.
It aims to provide a benchmark for various Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy with four hierarchy levels, including acoustic, performance, score, and high-level description.
We then establish a unified protocol based on 14 tasks on 8 public-available datasets, providing a fair and standard assessment of representations of all open-sourced pre-trained models developed on music recordings as baselines.
arXiv Detail & Related papers (2023-06-18T12:56:46Z) - Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - One-Shot Acoustic Matching Of Audio Signals -- Learning to Hear Music In
Any Room/ Concert Hall [3.652509571098291]
We propose a novel architecture that can transform a sound of interest into any other acoustic space of interest.
Our framework allows a neural network to adjust gains of every point in the time-frequency representation.
arXiv Detail & Related papers (2022-10-27T19:54:05Z) - Museformer: Transformer with Fine- and Coarse-Grained Attention for
Music Generation [138.74751744348274]
We propose Museformer, a Transformer with a novel fine- and coarse-grained attention for music generation.
Specifically, with the fine-grained attention, a token of a specific bar directly attends to all the tokens of the bars that are most relevant to music structures.
With the coarse-grained attention, a token only attends to the summarization of the other bars rather than each token of them so as to reduce the computational cost.
arXiv Detail & Related papers (2022-10-19T07:31:56Z) - Musika! Fast Infinite Waveform Music Generation [0.0]
We introduce Musika, a music generation system that can be trained on hundreds of hours of music using a single consumer GPU.
We achieve this by first learning a compact invertible representation of spectrogram magnitudes and phases with adversarial autoencoders.
A latent coordinate system enables generating arbitrarily long sequences of excerpts in parallel, while a global context vector allows the music to remain stylistically coherent through time.
arXiv Detail & Related papers (2022-08-18T08:31:15Z) - Quantized GAN for Complex Music Generation from Dance Videos [48.196705493763986]
We present Dance2Music-GAN (D2M-GAN), a novel adversarial multi-modal framework that generates musical samples conditioned on dance videos.
Our proposed framework takes dance video frames and human body motion as input, and learns to generate music samples that plausibly accompany the corresponding input.
arXiv Detail & Related papers (2022-04-01T17:53:39Z) - Contrastive Learning with Positive-Negative Frame Mask for Music
Representation [91.44187939465948]
This paper proposes a novel Positive-nEgative frame mask for Music Representation based on the contrastive learning framework, abbreviated as PEMR.
We devise a novel contrastive learning objective to accommodate both self-augmented positives/negatives sampled from the same music.
arXiv Detail & Related papers (2022-03-17T07:11:42Z) - Unsupervised Learning of Audio Perception for Robotics Applications:
Learning to Project Data to T-SNE/UMAP space [2.8935588665357077]
This paper builds upon key ideas to build perception of touch sounds without access to any ground-truth data.
We show how we can leverage ideas from classical signal processing to get large amounts of data of any sound of interest with a high precision.
arXiv Detail & Related papers (2020-02-10T20:33:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.