Expressive Communication: A Common Framework for Evaluating Developments
in Generative Models and Steering Interfaces
- URL: http://arxiv.org/abs/2111.14951v1
- Date: Mon, 29 Nov 2021 20:57:55 GMT
- Title: Expressive Communication: A Common Framework for Evaluating Developments
in Generative Models and Steering Interfaces
- Authors: Ryan Louie, Jesse Engel, Anna Huang
- Abstract summary: This study investigates how developments in both models and user interfaces are important for empowering co-creation.
In an evaluation study with 26 composers creating 100+ pieces of music and listeners providing 1000+ head-to-head comparisons, we find that more expressive models and more steerable interfaces are important.
- Score: 1.2891210250935146
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: There is an increasing interest from ML and HCI communities in empowering
creators with better generative models and more intuitive interfaces with which
to control them. In music, ML researchers have focused on training models
capable of generating pieces with increasing long-range structure and musical
coherence, while HCI researchers have separately focused on designing steering
interfaces that support user control and ownership. In this study, we
investigate through a common framework how developments in both models and user
interfaces are important for empowering co-creation where the goal is to create
music that communicates particular imagery or ideas (e.g., as is common for
other purposeful tasks in music creation like establishing mood or creating
accompanying music for another media). Our study is distinguished in that it
measures communication through both composer's self-reported experiences, and
how listeners evaluate this communication through the music. In an evaluation
study with 26 composers creating 100+ pieces of music and listeners providing
1000+ head-to-head comparisons, we find that more expressive models and more
steerable interfaces are important and complementary ways to make a difference
in composers communicating through music and supporting their creative
empowerment.
Related papers
- A Survey of Foundation Models for Music Understanding [60.83532699497597]
This work is one of the early reviews of the intersection of AI techniques and music understanding.
We investigated, analyzed, and tested recent large-scale music foundation models in respect of their music comprehension abilities.
arXiv Detail & Related papers (2024-09-15T03:34:14Z) - Creativity and Visual Communication from Machine to Musician: Sharing a Score through a Robotic Camera [4.9485163144728235]
This paper explores the integration of visual communication and musical interaction by implementing a robotic camera within a "Guided Harmony" musical game.
The robotic system interprets and responds to nonverbal cues from musicians, creating a collaborative and adaptive musical experience.
arXiv Detail & Related papers (2024-09-09T16:34:36Z) - MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models [57.47799823804519]
We are inspired by how musicians compose music not just from a movie script, but also through visualizations.
We propose MeLFusion, a model that can effectively use cues from a textual description and the corresponding image to synthesize music.
Our exhaustive experimental evaluation suggests that adding visual information to the music synthesis pipeline significantly improves the quality of generated music.
arXiv Detail & Related papers (2024-06-07T06:38:59Z) - ComposerX: Multi-Agent Symbolic Music Composition with LLMs [51.68908082829048]
Music composition is a complex task that requires abilities to understand and generate information with long dependency and harmony constraints.
Current LLMs easily fail in this task, generating ill-written music even when equipped with modern techniques like In-Context-Learning and Chain-of-Thoughts.
We propose ComposerX, an agent-based symbolic music generation framework.
arXiv Detail & Related papers (2024-04-28T06:17:42Z) - MuPT: A Generative Symbolic Music Pretrained Transformer [56.09299510129221]
We explore the application of Large Language Models (LLMs) to the pre-training of music.
To address the challenges associated with misaligned measures from different tracks during generation, we propose a Synchronized Multi-Track ABC Notation (SMT-ABC Notation)
Our contributions include a series of models capable of handling up to 8192 tokens, covering 90% of the symbolic music data in our training set.
arXiv Detail & Related papers (2024-04-09T15:35:52Z) - Interactive Melody Generation System for Enhancing the Creativity of
Musicians [0.0]
This study proposes a system designed to enumerate the process of collaborative composition among humans.
By integrating multiple Recurrent Neural Network (RNN) models, the system provides an experience akin to collaborating with several composers.
arXiv Detail & Related papers (2024-03-06T01:33:48Z) - ByteComposer: a Human-like Melody Composition Method based on Language
Model Agent [11.792129708566598]
Large Language Models (LLM) have shown encouraging progress in multimodal understanding and generation tasks.
We propose ByteComposer, an agent framework emulating a human's creative pipeline in four separate steps.
We conduct extensive experiments on GPT4 and several open-source large language models, which substantiate our framework's effectiveness.
arXiv Detail & Related papers (2024-02-24T04:35:07Z) - MusicRL: Aligning Music Generation to Human Preferences [62.44903326718772]
MusicRL is the first music generation system finetuned from human feedback.
We deploy MusicLM to users and collect a substantial dataset comprising 300,000 pairwise preferences.
We train MusicRL-U, the first text-to-music model that incorporates human feedback at scale.
arXiv Detail & Related papers (2024-02-06T18:36:52Z) - Music Harmony Generation, through Deep Learning and Using a
Multi-Objective Evolutionary Algorithm [0.0]
This paper introduces a genetic multi-objective evolutionary optimization algorithm for the generation of polyphonic music.
One of the goals is the rules and regulations of music, which, along with the other two goals, including the scores of music experts and ordinary listeners, fits the cycle of evolution to get the most optimal response.
The results show that the proposed method is able to generate difficult and pleasant pieces with desired styles and lengths, along with harmonic sounds that follow the grammar while attracting the listener, at the same time.
arXiv Detail & Related papers (2021-02-16T05:05:54Z) - Multi-Modal Music Information Retrieval: Augmenting Audio-Analysis with
Visual Computing for Improved Music Video Analysis [91.3755431537592]
This thesis combines audio-analysis with computer vision to approach Music Information Retrieval (MIR) tasks from a multi-modal perspective.
The main hypothesis of this work is based on the observation that certain expressive categories such as genre or theme can be recognized on the basis of the visual content alone.
The experiments are conducted for three MIR tasks Artist Identification, Music Genre Classification and Cross-Genre Classification.
arXiv Detail & Related papers (2020-02-01T17:57:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.