Emotion4MIDI: a Lyrics-based Emotion-Labeled Symbolic Music Dataset
- URL: http://arxiv.org/abs/2307.14783v1
- Date: Thu, 27 Jul 2023 11:24:47 GMT
- Title: Emotion4MIDI: a Lyrics-based Emotion-Labeled Symbolic Music Dataset
- Authors: Serkan Sulun, Pedro Oliveira, Paula Viana
- Abstract summary: We present a new large-scale emotion-labeled symbolic music dataset consisting of 12k MIDI songs.
We first trained emotion classification models on the GoEmotions dataset, achieving state-of-the-art results with a model half the size of the baseline.
Our dataset covers a wide range of fine-grained emotions, providing a valuable resource to explore the connection between music and emotions.
- Score: 1.3607388598209322
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a new large-scale emotion-labeled symbolic music dataset
consisting of 12k MIDI songs. To create this dataset, we first trained emotion
classification models on the GoEmotions dataset, achieving state-of-the-art
results with a model half the size of the baseline. We then applied these
models to lyrics from two large-scale MIDI datasets. Our dataset covers a wide
range of fine-grained emotions, providing a valuable resource to explore the
connection between music and emotions and, especially, to develop models that
can generate music based on specific emotions. Our code for inference, trained
models, and datasets are available online.
Related papers
- Are We There Yet? A Brief Survey of Music Emotion Prediction Datasets, Models and Outstanding Challenges [9.62904012066486]
We provide a comprehensive overview of the available music-emotion datasets and discuss evaluation standards as well as competitions in the field.
We highlight the challenges that persist in accurately capturing emotion in music, including issues related to dataset quality, annotation consistency, and model generalization.
We have complemented our findings with an accompanying GitHub repository.
arXiv Detail & Related papers (2024-06-13T05:00:27Z) - Emotion Manipulation Through Music -- A Deep Learning Interactive Visual Approach [0.0]
We introduce a novel way to manipulate the emotional content of a song using AI tools.
Our goal is to achieve the desired emotion while leaving the original melody as intact as possible.
This research may contribute to on-demand custom music generation, the automated remixing of existing work, and music playlists tuned for emotional progression.
arXiv Detail & Related papers (2024-06-12T20:12:29Z) - MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations.
We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music.
Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z) - MidiCaps: A large-scale MIDI dataset with text captions [6.806050368211496]
This work aims to enable research that combines LLMs with symbolic music by presenting, the first openly available large-scale MIDI dataset with text captions.
Inspired by recent advancements in captioning techniques, we present a curated dataset of over 168k MIDI files with textual descriptions.
arXiv Detail & Related papers (2024-06-04T12:21:55Z) - MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music.
To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation)
Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z) - A Novel Multi-Task Learning Method for Symbolic Music Emotion
Recognition [76.65908232134203]
Symbolic Music Emotion Recognition(SMER) is to predict music emotion from symbolic data, such as MIDI and MusicXML.
In this paper, we present a simple multi-task framework for SMER, which incorporates the emotion recognition task with other emotion-related auxiliary tasks.
arXiv Detail & Related papers (2022-01-15T07:45:10Z) - Using a Bi-directional LSTM Model with Attention Mechanism trained on
MIDI Data for Generating Unique Music [0.25559196081940677]
This paper proposes a bi-directional LSTM model with attention mechanism capable of generating similar type of music based on MIDI data.
The music generated by the model follows the theme/style of the music the model is trained on.
arXiv Detail & Related papers (2020-11-02T06:43:28Z) - Modality-Transferable Emotion Embeddings for Low-Resource Multimodal
Emotion Recognition [55.44502358463217]
We propose a modality-transferable model with emotion embeddings to tackle the aforementioned issues.
Our model achieves state-of-the-art performance on most of the emotion categories.
Our model also outperforms existing baselines in the zero-shot and few-shot scenarios for unseen emotions.
arXiv Detail & Related papers (2020-09-21T06:10:39Z) - PopMAG: Pop Music Accompaniment Generation [190.09996798215738]
We propose a novel MUlti-track MIDI representation (MuMIDI) which enables simultaneous multi-track generation in a single sequence.
MuMIDI enlarges the sequence length and brings the new challenge of long-term music modeling.
We call our system for pop music accompaniment generation as PopMAG.
arXiv Detail & Related papers (2020-08-18T02:28:36Z) - Foley Music: Learning to Generate Music from Videos [115.41099127291216]
Foley Music is a system that can synthesize plausible music for a silent video clip about people playing musical instruments.
We first identify two key intermediate representations for a successful video to music generator: body keypoints from videos and MIDI events from audio recordings.
We present a Graph$-$Transformer framework that can accurately predict MIDI event sequences in accordance with the body movements.
arXiv Detail & Related papers (2020-07-21T17:59:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.