SinTra: Learning an inspiration model from a single multi-track music
segment
- URL: http://arxiv.org/abs/2204.09917v1
- Date: Thu, 21 Apr 2022 07:13:30 GMT
- Title: SinTra: Learning an inspiration model from a single multi-track music
segment
- Authors: Qingwei Song, Qiwei Sun, Dongsheng Guo, Haiyong Zheng
- Abstract summary: SinTra is an auto-regressive sequential generative model that can learn from a single multi-track music segment.
We show that SinTra can learn information from a single music segment more sufficiently than Music Transformer.
- Score: 6.841442146760313
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose SinTra, an auto-regressive sequential generative
model that can learn from a single multi-track music segment, to generate
coherent, aesthetic, and variable polyphonic music of multi-instruments with an
arbitrary length of bar. For this task, to ensure the relevance of generated
samples and training music, we present a novel pitch-group representation.
SinTra, consisting of a pyramid of Transformer-XL with a multi-scale training
strategy, can learn both the musical structure and the relative positional
relationship between notes of the single training music segment. Additionally,
for maintaining the inter-track correlation, we use the convolution operation
to process multi-track music, and when decoding, the tracks are independent to
each other to prevent interference. We evaluate SinTra with both subjective
study and objective metrics. The comparison results show that our framework can
learn information from a single music segment more sufficiently than Music
Transformer. Also the comparison between SinTra and its variant, i.e., the
single-stage SinTra with the first stage only, shows that the pyramid structure
can effectively suppress overly-fragmented notes.
Related papers
- UniMuMo: Unified Text, Music and Motion Generation [57.72514622935806]
We introduce UniMuMo, a unified multimodal model capable of taking arbitrary text, music, and motion data as input conditions to generate outputs across all three modalities.
By converting music, motion, and text into token-based representation, our model bridges these modalities through a unified encoder-decoder transformer architecture.
arXiv Detail & Related papers (2024-10-06T16:04:05Z) - MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music.
To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation)
Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z) - Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music
Transcription [19.228155694144995]
Timbre-Trap is a novel framework which unifies music transcription and audio reconstruction.
We train a single autoencoder to simultaneously estimate pitch salience and reconstruct complex spectral coefficients.
We demonstrate that the framework leads to performance comparable to state-of-the-art instrument-agnostic transcription methods.
arXiv Detail & Related papers (2023-09-27T15:19:05Z) - Simple and Controllable Music Generation [94.61958781346176]
MusicGen is a single Language Model (LM) that operates over several streams of compressed discrete music representation, i.e., tokens.
Unlike prior work, MusicGen is comprised of a single-stage transformer LM together with efficient token interleaving patterns.
arXiv Detail & Related papers (2023-06-08T15:31:05Z) - GETMusic: Generating Any Music Tracks with a Unified Representation and
Diffusion Framework [58.64512825534638]
Symbolic music generation aims to create musical notes, which can help users compose music.
We introduce a framework known as GETMusic, with GET'' standing for GEnerate music Tracks''
GETScore represents musical notes as tokens and organizes tokens in a 2D structure, with tracks stacked vertically and progressing horizontally over time.
Our proposed representation, coupled with the non-autoregressive generative model, empowers GETMusic to generate music with any arbitrary source-target track combinations.
arXiv Detail & Related papers (2023-05-18T09:53:23Z) - Musical Voice Separation as Link Prediction: Modeling a Musical
Perception Task as a Multi-Trajectory Tracking Problem [6.617487928813374]
This paper targets the perceptual task of separating the different interacting voices, i.e., monophonic melodic streams, in a polyphonic musical piece.
We model this task as a Multi-Trajectory Tracking (MTT) problem from discrete observations, i.e. notes in a pitch-time space.
Our approach builds a graph from a musical piece, by creating one node for every note, and separates the melodic trajectories by predicting a link between two notes if they are consecutive in the same voice/stream.
arXiv Detail & Related papers (2023-04-28T13:48:00Z) - Comparision Of Adversarial And Non-Adversarial LSTM Music Generative
Models [2.569647910019739]
This work implements and compares adversarial and non-adversarial training of recurrent neural network music composers on MIDI data.
The evaluation indicates that adversarial training produces more aesthetically pleasing music.
arXiv Detail & Related papers (2022-11-01T20:23:49Z) - Contrastive Learning with Positive-Negative Frame Mask for Music
Representation [91.44187939465948]
This paper proposes a novel Positive-nEgative frame mask for Music Representation based on the contrastive learning framework, abbreviated as PEMR.
We devise a novel contrastive learning objective to accommodate both self-augmented positives/negatives sampled from the same music.
arXiv Detail & Related papers (2022-03-17T07:11:42Z) - Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music.
We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z) - Modeling Musical Structure with Artificial Neural Networks [0.0]
I explore the application of artificial neural networks to different aspects of musical structure modeling.
I show how a connectionist model, the Gated Autoencoder (GAE), can be employed to learn transformations between musical fragments.
I propose a special predictive training of the GAE, which yields a representation of polyphonic music as a sequence of intervals.
arXiv Detail & Related papers (2020-01-06T18:35:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.