Setting the rhythm scene: deep learning-based drum loop generation from
arbitrary language cues
- URL: http://arxiv.org/abs/2209.10016v1
- Date: Tue, 20 Sep 2022 21:53:35 GMT
- Title: Setting the rhythm scene: deep learning-based drum loop generation from
arbitrary language cues
- Authors: Ignacio J. Tripodi
- Abstract summary: We present a novel method that generates 2 compasses of a 4-piece drum pattern that embodies the "mood" of a language cue.
We envision this tool as composition aid for electronic music and audiovisual soundtrack production, or an improvisation tool for live performance.
In order to produce the training samples for this model, besides manual annotation of the "scene" or "mood" terms, we have designed a novel method to extract the consensus drum track of any song.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Generative artificial intelligence models can be a valuable aid to music
composition and live performance, both to aid the professional musician and to
help democratize the music creation process for hobbyists. Here we present a
novel method that, given an English word or phrase, generates 2 compasses of a
4-piece drum pattern that embodies the "mood" of the given language cue, or
that could be used for an audiovisual scene described by the language cue. We
envision this tool as composition aid for electronic music and audiovisual
soundtrack production, or an improvisation tool for live performance. In order
to produce the training samples for this model, besides manual annotation of
the "scene" or "mood" terms, we have designed a novel method to extract the
consensus drum track of any song. This consists of a 2-bar, 4-piece drum
pattern that represents the main percussive motif of a song, which could be
imported into any music loop device or live looping software. These two key
components (drum pattern generation from a generalizable input, and consensus
percussion extraction) present a novel approach to computer-aided composition
and provide a stepping stone for more comprehensive rhythm generation.
Related papers
- MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization [52.498942604622165]
This paper presents MuVi, a framework to generate music that aligns with video content.
MuVi analyzes video content through a specially designed visual adaptor to extract contextually and temporally relevant features.
We show that MuVi demonstrates superior performance in both audio quality and temporal synchronization.
arXiv Detail & Related papers (2024-10-16T18:44:56Z) - SongCreator: Lyrics-based Universal Song Generation [53.248473603201916]
SongCreator is a song-generation system designed to tackle the challenge of generating songs with both vocals and accompaniment given lyrics.
The model features two novel designs: a meticulously designed dual-sequence language model (M) to capture the information of vocals and accompaniment for song generation, and a series of attention mask strategies for DSLM.
Experiments demonstrate the effectiveness of SongCreator by achieving state-of-the-art or competitive performances on all eight tasks.
arXiv Detail & Related papers (2024-09-09T19:37:07Z) - Subtractive Training for Music Stem Insertion using Latent Diffusion Models [35.91945598575059]
We present Subtractive Training, a method for synthesizing individual musical instrument stems given other instruments as context.
Our results demonstrate Subtractive Training's efficacy in creating authentic drum stems that seamlessly blend with the existing tracks.
We extend this technique to MIDI formats, successfully generating compatible bass, drum, and guitar parts for incomplete arrangements.
arXiv Detail & Related papers (2024-06-27T16:59:14Z) - MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations.
We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music.
Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z) - Language-Guided Music Recommendation for Video via Prompt Analogies [35.48998901411509]
We propose a method to recommend music for an input video while allowing a user to guide music selection with free-form natural language.
Existing music video datasets provide the needed (video, music) training pairs, but lack text descriptions of the music.
arXiv Detail & Related papers (2023-06-15T17:58:01Z) - Noise2Music: Text-conditioned Music Generation with Diffusion Models [73.74580231353684]
We introduce Noise2Music, where a series of diffusion models is trained to generate high-quality 30-second music clips from text prompts.
We find that the generated audio is not only able to faithfully reflect key elements of the text prompt such as genre, tempo, instruments, mood, and era.
Pretrained large language models play a key role in this story -- they are used to generate paired text for the audio of the training set and to extract embeddings of the text prompts ingested by the diffusion models.
arXiv Detail & Related papers (2023-02-08T07:27:27Z) - Generating Coherent Drum Accompaniment With Fills And Improvisations [8.334918207379172]
We tackle the task of drum pattern generation conditioned on the accompanying music played by four melodic instruments.
We propose a novelty function to capture the extent of improvisation in a bar relative to its neighbors.
We train a model to predict improvisation locations from the melodic accompaniment tracks.
arXiv Detail & Related papers (2022-09-01T08:31:26Z) - Re-creation of Creations: A New Paradigm for Lyric-to-Melody Generation [158.54649047794794]
Re-creation of Creations (ROC) is a new paradigm for lyric-to-melody generation.
ROC achieves good lyric-melody feature alignment in lyric-to-melody generation.
arXiv Detail & Related papers (2022-08-11T08:44:47Z) - Towards Automatic Instrumentation by Learning to Separate Parts in
Symbolic Multitrack Music [33.679951600368405]
We study the feasibility of automatic instrumentation -- dynamically assigning instruments to notes in solo music during performance.
In addition to the online, real-time-capable setting for performative use cases, automatic instrumentation can also find applications in assistive composing tools in an offline setting.
We frame the task of part separation as a sequential multi-class classification problem and adopt machine learning to map sequences of notes into sequences of part labels.
arXiv Detail & Related papers (2021-07-13T08:34:44Z) - Artificial Neural Networks Jamming on the Beat [20.737171876839238]
The paper presents a large dataset of drum patterns alongside with corresponding melodies.
exploring a latent space of drum patterns one could generate new drum patterns with a given music style.
A simple artificial neural network could be trained to generate melodies corresponding with these drum patters used as inputs.
arXiv Detail & Related papers (2020-07-13T10:09:20Z) - Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music.
We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.