MixAssist: An Audio-Language Dataset for Co-Creative AI Assistance in Music Mixing
- URL: http://arxiv.org/abs/2507.06329v1
- Date: Tue, 08 Jul 2025 18:33:26 GMT
- Title: MixAssist: An Audio-Language Dataset for Co-Creative AI Assistance in Music Mixing
- Authors: Michael Clemens, Ana Marasović,
- Abstract summary: MixAssist is a novel audio-language dataset capturing the situated, multi-turn dialogue between expert and amateur music producers.<n>By focusing on co-creative instruction grounded in audio context, MixAssist enables the development of intelligent AI assistants.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While AI presents significant potential for enhancing music mixing and mastering workflows, current research predominantly emphasizes end-to-end automation or generation, often overlooking the collaborative and instructional dimensions vital for co-creative processes. This gap leaves artists, particularly amateurs seeking to develop expertise, underserved. To bridge this, we introduce MixAssist, a novel audio-language dataset capturing the situated, multi-turn dialogue between expert and amateur music producers during collaborative mixing sessions. Comprising 431 audio-grounded conversational turns derived from 7 in-depth sessions involving 12 producers, MixAssist provides a unique resource for training and evaluating audio-language models that can comprehend and respond to the complexities of real-world music production dialogues. Our evaluations, including automated LLM-as-a-judge assessments and human expert comparisons, demonstrate that fine-tuning models such as Qwen-Audio on MixAssist can yield promising results, with Qwen significantly outperforming other tested models in generating helpful, contextually relevant mixing advice. By focusing on co-creative instruction grounded in audio context, MixAssist enables the development of intelligent AI assistants designed to support and augment the creative process in music mixing.
Related papers
- ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing [52.33281620699459]
ThinkSound is a novel framework that leverages Chain-of-Thought (CoT) reasoning to enable stepwise, interactive audio generation and editing for videos.<n>Our approach decomposes the process into three complementary stages: semantically coherent, interactive object-centric refinement through precise user interactions, and targeted editing guided by natural language instructions.<n> Experiments demonstrate that ThinkSound achieves state-of-the-art performance in video-to-audio generation across both audio metrics and CoT metrics.
arXiv Detail & Related papers (2025-06-26T16:32:06Z) - Evaluating Human-AI Interaction via Usability, User Experience and Acceptance Measures for MMM-C: A Creative AI System for Music Composition [4.152843247686306]
We report on a thorough evaluation of the user adoption of the Multi-Track Music Machine (MMM) as a co-creative AI tool for music composers.<n>To do this, we integrate MMM into Cubase, a popular Digital Audio Workstation (DAW) by Steinberg.<n>We contribute a methodological assemblage as a 3-part mixed method study measuring usability, user experience and technology acceptance of the system.
arXiv Detail & Related papers (2025-04-18T20:41:02Z) - Musical Agent Systems: MACAT and MACataRT [6.349140286855134]
We introduce MACAT and MACataRT, two distinct musical agent systems crafted to enhance interactive music-making between human musicians and AI.<n>MaCAT is optimized for agent-led performance, employing real-time synthesis and self-listening to shape its output autonomously.<n>MacataRT provides a flexible environment for collaborative improvisation through audio mosaicing and sequence-based learning.
arXiv Detail & Related papers (2025-01-19T22:04:09Z) - ComposerX: Multi-Agent Symbolic Music Composition with LLMs [51.68908082829048]
Music composition is a complex task that requires abilities to understand and generate information with long dependency and harmony constraints.
Current LLMs easily fail in this task, generating ill-written music even when equipped with modern techniques like In-Context-Learning and Chain-of-Thoughts.
We propose ComposerX, an agent-based symbolic music generation framework.
arXiv Detail & Related papers (2024-04-28T06:17:42Z) - Interactive Melody Generation System for Enhancing the Creativity of
Musicians [0.0]
This study proposes a system designed to enumerate the process of collaborative composition among humans.
By integrating multiple Recurrent Neural Network (RNN) models, the system provides an experience akin to collaborating with several composers.
arXiv Detail & Related papers (2024-03-06T01:33:48Z) - The Role of Communication and Reference Songs in the Mixing Process:
Insights from Professional Mix Engineers [0.0]
This paper presents the findings of a two-phased exploratory study aimed at understanding how professional mixing engineers interact with clients and use their feedback to guide the mixing process.
The results of this study shed light on the importance of collaboration, empathy, and intention in the mixing process, and can inform the development of smart multi-track mixing systems.
arXiv Detail & Related papers (2023-09-06T23:45:09Z) - WavJourney: Compositional Audio Creation with Large Language Models [38.39551216587242]
We present WavJourney, a novel framework that leverages Large Language Models to connect various audio models for audio creation.
WavJourney allows users to create storytelling audio content with diverse audio elements simply from textual descriptions.
We show that WavJourney is capable of synthesizing realistic audio aligned with textually-described semantic, spatial and temporal conditions.
arXiv Detail & Related papers (2023-07-26T17:54:04Z) - MARBLE: Music Audio Representation Benchmark for Universal Evaluation [79.25065218663458]
We introduce the Music Audio Representation Benchmark for universaL Evaluation, termed MARBLE.
It aims to provide a benchmark for various Music Information Retrieval (MIR) tasks by defining a comprehensive taxonomy with four hierarchy levels, including acoustic, performance, score, and high-level description.
We then establish a unified protocol based on 14 tasks on 8 public-available datasets, providing a fair and standard assessment of representations of all open-sourced pre-trained models developed on music recordings as baselines.
arXiv Detail & Related papers (2023-06-18T12:56:46Z) - MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training [74.32603591331718]
We propose an acoustic Music undERstanding model with large-scale self-supervised Training (MERT), which incorporates teacher models to provide pseudo labels in the masked language modelling (MLM) style acoustic pre-training.<n> Experimental results indicate that our model can generalise and perform well on 14 music understanding tasks and attain state-of-the-art (SOTA) overall scores.
arXiv Detail & Related papers (2023-05-31T18:27:43Z) - AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking
Head [82.69233563811487]
Large language models (LLMs) have exhibited remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition.
We propose a multi-modal AI system named AudioGPT, which complements LLMs with foundation models to process complex audio information.
arXiv Detail & Related papers (2023-04-25T17:05:38Z) - Automatic music mixing with deep learning and out-of-domain data [10.670987762781834]
Music mixing traditionally involves recording instruments in the form of clean, individual tracks and blending them into a final mixture using audio effects and expert knowledge.
We propose a novel data preprocessing method that allows the models to perform automatic music mixing.
We also redesigned a listening test method for evaluating music mixing systems.
arXiv Detail & Related papers (2022-08-24T10:50:22Z) - RL-Duet: Online Music Accompaniment Generation Using Deep Reinforcement
Learning [69.20460466735852]
This paper presents a deep reinforcement learning algorithm for online accompaniment generation.
The proposed algorithm is able to respond to the human part and generate a melodic, harmonic and diverse machine part.
arXiv Detail & Related papers (2020-02-08T03:53:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.