Visualizing Ensemble Predictions of Music Mood
- URL: http://arxiv.org/abs/2112.07627v1
- Date: Tue, 14 Dec 2021 18:13:21 GMT
- Title: Visualizing Ensemble Predictions of Music Mood
- Authors: Zelin Ye and Min Chen
- Abstract summary: We show that visualization techniques can effectively convey the popular prediction as well as uncertainty at different music sections along the temporal axis.
We introduce a new variant of ThemeRiver, called "dual-flux ThemeRiver", which allows viewers to observe and measure the most popular prediction more easily.
- Score: 4.5383186433033735
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Music mood classification has been a challenging problem in comparison with
some other classification problems (e.g., genre, composer, or period). One
solution for addressing this challenging is to use an of ensemble machine
learning models. In this paper, we show that visualization techniques can
effectively convey the popular prediction as well as uncertainty at different
music sections along the temporal axis, while enabling the analysis of
individual ML models in conjunction with their application to different musical
data. In addition to the traditional visual designs, such as stacked line
graph, ThemeRiver, and pixel-based visualization, we introduced a new variant
of ThemeRiver, called "dual-flux ThemeRiver", which allows viewers to observe
and measure the most popular prediction more easily than stacked line graph and
ThemeRiver. Testing indicates that visualizing ensemble predictions is helpful
both in model-development workflows and for annotating music using model
predictions.
Related papers
- Are We There Yet? A Brief Survey of Music Emotion Prediction Datasets, Models and Outstanding Challenges [9.62904012066486]
We provide a comprehensive overview of the available music-emotion datasets and discuss evaluation standards as well as competitions in the field.
We highlight the challenges that persist in accurately capturing emotion in music, including issues related to dataset quality, annotation consistency, and model generalization.
We have complemented our findings with an accompanying GitHub repository.
arXiv Detail & Related papers (2024-06-13T05:00:27Z) - Perception-Inspired Graph Convolution for Music Understanding Tasks [3.5570874721859016]
We propose a new graph convolutional block, MusGConv, specifically designed for the efficient processing of musical score data.
We evaluate our approach on four different musical understanding problems.
arXiv Detail & Related papers (2024-05-15T10:04:44Z) - Motif-Centric Representation Learning for Symbolic Music [5.781931021964343]
We learn the implicit relationship between motifs and their variations via representation learning.
A regularization-based method, VICReg, is adopted for pretraining, while contrastive learning is used for fine-tuning.
We visualize the acquired motif representations, offering an intuitive comprehension of the overall structure of a music piece.
arXiv Detail & Related papers (2023-09-19T13:09:03Z) - Visual Tuning [143.43997336384126]
Fine-tuning visual models has been widely shown promising performance on many downstream visual tasks.
Recent advances can achieve superior performance than full-tuning the whole pre-trained parameters.
This survey characterizes a large and thoughtful selection of recent works, providing a systematic and comprehensive overview of work and models.
arXiv Detail & Related papers (2023-05-10T11:26:36Z) - Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music
Generation Task [86.72661027591394]
We generate complete and semantically consistent symbolic music scores from text descriptions.
We explore the efficacy of using publicly available checkpoints for natural language processing in the task of text-to-music generation.
Our experimental results show that the improvement from using pre-trained checkpoints is statistically significant in terms of BLEU score and edit distance similarity.
arXiv Detail & Related papers (2022-11-21T07:19:17Z) - Relating Human Perception of Musicality to Prediction in a Predictive
Coding Model [0.8062120534124607]
We explore the use of a neural network inspired by predictive coding for modeling human music perception.
This network was developed based on the computational neuroscience theory of recurrent interactions in the hierarchical visual cortex.
We adapt this network to model the hierarchical auditory system and investigate whether it will make similar choices to humans regarding the musicality of a set of random pitch sequences.
arXiv Detail & Related papers (2022-10-29T12:20:01Z) - A Graph-Enhanced Click Model for Web Search [67.27218481132185]
We propose a novel graph-enhanced click model (GraphCM) for web search.
We exploit both intra-session and inter-session information for the sparsity and cold-start problems.
arXiv Detail & Related papers (2022-06-17T08:32:43Z) - Perceptual Score: What Data Modalities Does Your Model Perceive? [73.75255606437808]
We introduce the perceptual score, a metric that assesses the degree to which a model relies on the different subsets of the input features.
We find that recent, more accurate multi-modal models for visual question-answering tend to perceive the visual data less than their predecessors.
Using the perceptual score also helps to analyze model biases by decomposing the score into data subset contributions.
arXiv Detail & Related papers (2021-10-27T12:19:56Z) - Tracing Back Music Emotion Predictions to Sound Sources and Intuitive
Perceptual Qualities [6.832341432995627]
Music emotion recognition is an important task in MIR (Music Information Retrieval) research.
One important step towards better models would be to understand what a model is actually learning from the data.
We show how to derive explanations of model predictions in terms of spectrogram image segments that connect to the high-level emotion prediction.
arXiv Detail & Related papers (2021-06-14T22:49:19Z) - Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music.
We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z) - Multi-Modal Music Information Retrieval: Augmenting Audio-Analysis with
Visual Computing for Improved Music Video Analysis [91.3755431537592]
This thesis combines audio-analysis with computer vision to approach Music Information Retrieval (MIR) tasks from a multi-modal perspective.
The main hypothesis of this work is based on the observation that certain expressive categories such as genre or theme can be recognized on the basis of the visual content alone.
The experiments are conducted for three MIR tasks Artist Identification, Music Genre Classification and Cross-Genre Classification.
arXiv Detail & Related papers (2020-02-01T17:57:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.