DAIRHuM: A Platform for Directly Aligning AI Representations with Human Musical Judgments applied to Carnatic Music
- URL: http://arxiv.org/abs/2411.14907v1
- Date: Fri, 22 Nov 2024 13:04:51 GMT
- Title: DAIRHuM: A Platform for Directly Aligning AI Representations with Human Musical Judgments applied to Carnatic Music
- Authors: Prashanth Thattai Ravikumar,
- Abstract summary: This paper presents a platform for exploring the Direct alignment between AI music model Representations and Human Musical judgments (DAIRHuM)
It is designed to enable musicians and experimentalists to label similarities in a dataset of music recordings, and examine a pre-trained model's alignment with their labels.
Results demonstrate significant findings on model alignment with human judgments of rhythmic harmony, while highlighting key differences in rhythm perception and music similarity judgments specific to Carnatic music.
- Score: 0.0
- License:
- Abstract: Quantifying and aligning music AI model representations with human behavior is an important challenge in the field of MIR. This paper presents a platform for exploring the Direct alignment between AI music model Representations and Human Musical judgments (DAIRHuM). It is designed to enable musicians and experimentalists to label similarities in a dataset of music recordings, and examine a pre-trained model's alignment with their labels using quantitative scores and visual plots. DAIRHuM is applied to analyze alignment between NSynth representations, and a rhythmic duet between two percussionists in a Carnatic quartet ensemble, an example of a genre where annotated data is scarce and assessing alignment is non-trivial. The results demonstrate significant findings on model alignment with human judgments of rhythmic harmony, while highlighting key differences in rhythm perception and music similarity judgments specific to Carnatic music. This work is among the first efforts to enable users to explore human-AI model alignment in Carnatic music and advance MIR research in Indian music while dealing with data scarcity and cultural specificity. The development of this platform provides greater accessibility to music AI tools for under-represented genres.
Related papers
- Attention-guided Spectrogram Sequence Modeling with CNNs for Music Genre Classification [0.0]
We present an innovative model for classifying music genres using attention-based temporal signature modeling.
Our approach captures the most temporally significant moments within each piece, crafting a unique "signature" for genre identification.
This work bridges the gap between technical classification tasks and the nuanced, human experience of genre.
arXiv Detail & Related papers (2024-11-18T21:57:03Z) - Foundation Models for Music: A Survey [77.77088584651268]
Foundations models (FMs) have profoundly impacted diverse sectors, including music.
This comprehensive review examines state-of-the-art (SOTA) pre-trained models and foundation models in music.
arXiv Detail & Related papers (2024-08-26T15:13:14Z) - Between the AI and Me: Analysing Listeners' Perspectives on AI- and Human-Composed Progressive Metal Music [1.2874569408514918]
We explore participants' perspectives on AI- vs human-generated progressive metal, using rock music as a control group.
We propose a mixed methods approach to assess the effects of generation type (human vs. AI), genre (progressive metal vs. rock), and curation process (random vs. cherry-picked)
Our findings validate the use of fine-tuning to achieve genre-specific specialization in AI music generation.
Despite some AI-generated excerpts receiving similar ratings to human music, listeners exhibited a preference for human compositions.
arXiv Detail & Related papers (2024-07-31T14:03:45Z) - MuDiT & MuSiT: Alignment with Colloquial Expression in Description-to-Song Generation [18.181382408551574]
We propose a novel task of Colloquial Description-to-Song Generation.
It focuses on aligning the generated content with colloquial human expressions.
This task is aimed at bridging the gap between colloquial language understanding and auditory expression within an AI model.
arXiv Detail & Related papers (2024-07-03T15:12:36Z) - MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations.
We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music.
Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z) - A Survey of Music Generation in the Context of Interaction [3.6522809408725223]
Machine learning has been successfully used to compose and generate music, both melodies and polyphonic pieces.
Most of these models are not suitable for human-machine co-creation through live interaction.
arXiv Detail & Related papers (2024-02-23T12:41:44Z) - MARBLE: Music Audio Representation Benchmark for Universal Evaluation [79.25065218663458]
We introduce the Music Audio Representation Benchmark for universaL Evaluation, termed MARBLE.
It aims to provide a benchmark for various Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy with four hierarchy levels, including acoustic, performance, score, and high-level description.
We then establish a unified protocol based on 14 tasks on 8 public-available datasets, providing a fair and standard assessment of representations of all open-sourced pre-trained models developed on music recordings as baselines.
arXiv Detail & Related papers (2023-06-18T12:56:46Z) - A Dataset for Greek Traditional and Folk Music: Lyra [69.07390994897443]
This paper presents a dataset for Greek Traditional and Folk music that includes 1570 pieces, summing in around 80 hours of data.
The dataset incorporates YouTube timestamped links for retrieving audio and video, along with rich metadata information with regards to instrumentation, geography and genre.
arXiv Detail & Related papers (2022-11-21T14:15:43Z) - Quantized GAN for Complex Music Generation from Dance Videos [48.196705493763986]
We present Dance2Music-GAN (D2M-GAN), a novel adversarial multi-modal framework that generates musical samples conditioned on dance videos.
Our proposed framework takes dance video frames and human body motion as input, and learns to generate music samples that plausibly accompany the corresponding input.
arXiv Detail & Related papers (2022-04-01T17:53:39Z) - Flat latent manifolds for music improvisation between human and machine [9.571383193449648]
We consider a music-generating algorithm as a counterpart to a human musician, in a setting where reciprocal improvisation is to lead to new experiences.
In the learned model, we generate novel musical sequences by quantification in latent space.
We provide empirical evidence for our method via a set of experiments on music and we deploy our model for an interactive jam session with a professional drummer.
arXiv Detail & Related papers (2022-02-23T09:00:17Z) - Music Gesture for Visual Sound Separation [121.36275456396075]
"Music Gesture" is a keypoint-based structured representation to explicitly model the body and finger movements of musicians when they perform music.
We first adopt a context-aware graph network to integrate visual semantic context with body dynamics, and then apply an audio-visual fusion model to associate body movements with the corresponding audio signals.
arXiv Detail & Related papers (2020-04-20T17:53:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.