The Music Annotation Pattern
- URL: http://arxiv.org/abs/2304.00988v1
- Date: Thu, 30 Mar 2023 11:13:59 GMT
- Title: The Music Annotation Pattern
- Authors: Jacopo de Berardinis, Albert Mero\~no-Pe\~nuela, Andrea Poltronieri,
Valentina Presutti
- Abstract summary: We introduce the Music Pattern, an Ontology Design Pattern (ODP) to homogenise different annotation systems and to represent several types of musical objects.
Our ODP accounts for multi-modality upfront, to describe annotations derived from different sources, and it is the first to enable the integration of music datasets at a large scale.
- Score: 1.2043574473965315
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The annotation of music content is a complex process to represent due to its
inherent multifaceted, subjectivity, and interdisciplinary nature. Numerous
systems and conventions for annotating music have been developed as independent
standards over the past decades. Little has been done to make them
interoperable, which jeopardises cross-corpora studies as it requires users to
familiarise with a multitude of conventions. Most of these systems lack the
semantic expressiveness needed to represent the complexity of the musical
language and cannot model multi-modal annotations originating from audio and
symbolic sources. In this article, we introduce the Music Annotation Pattern,
an Ontology Design Pattern (ODP) to homogenise different annotation systems and
to represent several types of musical objects (e.g. chords, patterns,
structures). This ODP preserves the semantics of the object's content at
different levels and temporal granularity. Moreover, our ODP accounts for
multi-modality upfront, to describe annotations derived from different sources,
and it is the first to enable the integration of music datasets at a large
scale.
Related papers
- PerceiverS: A Multi-Scale Perceiver with Effective Segmentation for Long-Term Expressive Symbolic Music Generation [5.201151187019607]
PerceiverS (Segmentation and Scale) is a novel architecture designed to generate long-structured and expressive music.
Our approach enhances symbolic music generation by simultaneously learning long-term structural dependencies and short-term expressive details.
The proposed model, evaluated on datasets like Maestro, demonstrates improvements in generating coherent and diverse music.
arXiv Detail & Related papers (2024-11-13T03:14:10Z) - MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions [69.9122231800796]
We present MMTrail, a large-scale multi-modality video-language dataset incorporating more than 20M trailer clips with visual captions.
We propose a systemic captioning framework, achieving various modality annotations with more than 27.1k hours of trailer videos.
Our dataset potentially paves the path for fine-grained large multimodal-language model training.
arXiv Detail & Related papers (2024-07-30T16:43:24Z) - Practical and Reproducible Symbolic Music Generation by Large Language Models with Structural Embeddings [28.685224087199053]
Music generation introduces challenging complexities to large language models.
Existing works share three major drawbacks: 1) their tokenization requires domain-specific annotations, such as bars and beats, that are typically missing in raw MIDI data.
We develop a MIDI-based music generation framework inspired by MuseNet, empirically studying two structural embeddings that do not rely on domain-specific annotations.
arXiv Detail & Related papers (2024-07-29T11:24:10Z) - MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations.
We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music.
Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z) - Mozart's Touch: A Lightweight Multi-modal Music Generation Framework Based on Pre-Trained Large Models [9.311353871322325]
Mozart's Touch is a framework capable of generating music aligned with cross-modal inputs such as images, videos, and text.
Unlike traditional end-to-end methods, Mozart's Touch uses LLMs to accurately interpret visual elements without requiring the training or fine-tuning of music generation models.
arXiv Detail & Related papers (2024-05-05T03:15:52Z) - MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music.
To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation)
Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z) - Multi-view MidiVAE: Fusing Track- and Bar-view Representations for Long
Multi-track Symbolic Music Generation [50.365392018302416]
We propose Multi-view MidiVAE, as one of the pioneers in VAE methods that effectively model and generate long multi-track symbolic music.
We focus on instrumental characteristics and harmony as well as global and local information about the musical composition by employing a hybrid variational encoding-decoding strategy.
arXiv Detail & Related papers (2024-01-15T08:41:01Z) - Graph-based Polyphonic Multitrack Music Generation [9.701208207491879]
This paper introduces a novel graph representation for music and a deep Variational Autoencoder that generates the structure and the content of musical graphs separately.
By separating the structure and content of musical graphs, it is possible to condition generation by specifying which instruments are played at certain times.
arXiv Detail & Related papers (2023-07-27T15:18:50Z) - Quantized GAN for Complex Music Generation from Dance Videos [48.196705493763986]
We present Dance2Music-GAN (D2M-GAN), a novel adversarial multi-modal framework that generates musical samples conditioned on dance videos.
Our proposed framework takes dance video frames and human body motion as input, and learns to generate music samples that plausibly accompany the corresponding input.
arXiv Detail & Related papers (2022-04-01T17:53:39Z) - Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music.
We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.