Related papers: Exploring XAI for the Arts: Explaining Latent Space in Generative Music

Exploring XAI for the Arts: Explaining Latent Space in Generative Music

URL: http://arxiv.org/abs/2308.05496v1
Date: Thu, 10 Aug 2023 10:59:24 GMT
Title: Exploring XAI for the Arts: Explaining Latent Space in Generative Music
Authors: Nick Bryan-Kinns, Berker Banar, Corey Ford, Courtney N. Reed, Yixiao Zhang, Simon Colton, Jack Armitage
Abstract summary: We show how a latent variable model for music generation can be made more explainable. We use latent space regularisation to force some specific dimensions of the latent space to map to meaningful musical attributes. We also provide a visualisation of the musical attributes in the latent space to help people understand and predict the effect of changes to latent space dimensions.
Score: 5.91328657300926
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Explainable AI has the potential to support more interactive and fluid co-creative AI systems which can creatively collaborate with people. To do this, creative AI models need to be amenable to debugging by offering eXplainable AI (XAI) features which are inspectable, understandable, and modifiable. However, currently there is very little XAI for the arts. In this work, we demonstrate how a latent variable model for music generation can be made more explainable; specifically we extend MeasureVAE which generates measures of music. We increase the explainability of the model by: i) using latent space regularisation to force some specific dimensions of the latent space to map to meaningful musical attributes, ii) providing a user interface feedback loop to allow people to adjust dimensions of the latent space and observe the results of these changes in real-time, iii) providing a visualisation of the musical attributes in the latent space to help people understand and predict the effect of changes to latent space dimensions. We suggest that in doing so we bridge the gap between the latent space and the generated musical outcomes in a meaningful way which makes the model and its outputs more explainable and more debuggable.

Related papers

DeformTune: A Deformable XAI Music Prototype for Non-Musicians [8.306938034148516]
This paper introduces DeformTune, a prototype system that combines a deformable interface with the MeasureVAE model to explore more intuitive, embodied, and explainable AI interaction.<n>We conducted a preliminary study with 11 adult participants without formal musical training to investigate their experience with AI-assisted music creation.<n>Thematic analysis of their feedback revealed recurring challenge--including unclear control mappings, limited expressive range, and the need for guidance throughout use.
arXiv Detail & Related papers (2025-07-31T20:57:59Z)
ReaLJam: Real-Time Human-AI Music Jamming with Reinforcement Learning-Tuned Transformers [53.63950017886757]
We introduce ReaLJam, an interface and protocol for live musical jamming sessions between a human and a Transformer-based AI agent trained with reinforcement learning. We enable real-time interactions using the concept of anticipation, where the agent continually predicts how the performance will unfold and visually conveys its plan to the user.
arXiv Detail & Related papers (2025-02-28T17:42:58Z)
EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation [55.26713167507132]
We introduce a generative robotics foundation model that constructs and interprets embodied spaces. EnerVerse employs an autoregressive video diffusion framework to predict future embodied spaces from instructions, enhanced by a sparse context memory for long-term reasoning. We present EnerVerse-D, a data engine pipeline combining the generative model with 4D Gaussian Splatting, forming a self-reinforcing data loop to reduce the sim-to-real gap.
arXiv Detail & Related papers (2025-01-03T17:00:33Z)
AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation [60.5897687447003]
AvatarGO is a novel framework designed to generate realistic 4D HOI scenes from textual inputs. Our framework not only generates coherent compositional motions, but also exhibits greater robustness in handling issues. As the first attempt to synthesize 4D avatars with object interactions, we hope AvatarGO could open new doors for human-centric 4D content creation.
arXiv Detail & Related papers (2024-10-09T17:58:56Z)
A Survey of Foundation Models for Music Understanding [60.83532699497597]
This work is one of the early reviews of the intersection of AI techniques and music understanding. We investigated, analyzed, and tested recent large-scale music foundation models in respect of their music comprehension abilities.
arXiv Detail & Related papers (2024-09-15T03:34:14Z)
Play Me Something Icy: Practical Challenges, Explainability and the Semantic Gap in Generative AI Music [0.0]
This pictorial aims to critically consider the nature of text-to-audio and text-to-music generative tools in the context of explainable AI.
arXiv Detail & Related papers (2024-08-13T22:42:05Z)
MuDiT & MuSiT: Alignment with Colloquial Expression in Description-to-Song Generation [18.181382408551574]
We propose a novel task of Colloquial Description-to-Song Generation. It focuses on aligning the generated content with colloquial human expressions. This task is aimed at bridging the gap between colloquial language understanding and auditory expression within an AI model.
arXiv Detail & Related papers (2024-07-03T15:12:36Z)
Exploring Variational Auto-Encoder Architectures, Configurations, and Datasets for Generative Music Explainable AI [7.391173255888337]
Generative AI models for music and the arts are increasingly complex and hard to understand. One approach to making generative AI models more understandable is to impose a small number of semantically meaningful attributes on generative AI models. This paper contributes a systematic examination of the impact that different combinations of Variational Auto-Encoder models (MeasureVAE and AdversarialVAE) have on music generation performance.
arXiv Detail & Related papers (2023-11-14T17:27:30Z)
Beyond Reality: The Pivotal Role of Generative AI in the Metaverse [98.1561456565877]
This paper offers a comprehensive exploration of how generative AI technologies are shaping the Metaverse. We delve into the applications of text generation models like ChatGPT and GPT-3, which are enhancing conversational interfaces with AI-generated characters. We also examine the potential of 3D model generation technologies like Point-E and Lumirithmic in creating realistic virtual objects.
arXiv Detail & Related papers (2023-07-28T05:44:20Z)
Visual Affordance Prediction for Guiding Robot Exploration [56.17795036091848]
We develop an approach for learning visual affordances for guiding robot exploration. We use a Transformer-based model to learn a conditional distribution in the latent embedding space of a VQ-VAE. We show how the trained affordance model can be used for guiding exploration by acting as a goal-sampling distribution, during visual goal-conditioned policy learning in robotic manipulation.
arXiv Detail & Related papers (2023-05-28T17:53:09Z)
ArK: Augmented Reality with Knowledge Interactive Emergent Ability [115.72679420999535]
We develop an infinite agent that learns to transfer knowledge memory from general foundation models to novel domains. The heart of our approach is an emerging mechanism, dubbed Augmented Reality with Knowledge Inference Interaction (ArK) We show that our ArK approach, combined with large foundation models, significantly improves the quality of generated 2D/3D scenes.
arXiv Detail & Related papers (2023-05-01T17:57:01Z)
Spatial Steerability of GANs via Self-Supervision from Discriminator [123.27117057804732]
We propose a self-supervised approach to improve the spatial steerability of GANs without searching for steerable directions in the latent space. Specifically, we design randomly sampled Gaussian heatmaps to be encoded into the intermediate layers of generative models as spatial inductive bias. During inference, users can interact with the spatial heatmaps in an intuitive manner, enabling them to edit the output image by adjusting the scene layout, moving, or removing objects.
arXiv Detail & Related papers (2023-01-20T07:36:29Z)
Flat latent manifolds for music improvisation between human and machine [9.571383193449648]
We consider a music-generating algorithm as a counterpart to a human musician, in a setting where reciprocal improvisation is to lead to new experiences. In the learned model, we generate novel musical sequences by quantification in latent space. We provide empirical evidence for our method via a set of experiments on music and we deploy our model for an interactive jam session with a professional drummer.
arXiv Detail & Related papers (2022-02-23T09:00:17Z)
Learning Style-Aware Symbolic Music Representations by Adversarial Autoencoders [9.923470453197657]
We focus on leveraging adversarial regularization as a flexible and natural mean to imbue variational autoencoders with context information. We introduce the first Music Adversarial Autoencoder (MusAE) Our model has a higher reconstruction accuracy than state-of-the-art models based on standard variational autoencoders.
arXiv Detail & Related papers (2020-01-15T18:07:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.