Exploring XAI for the Arts: Explaining Latent Space in Generative Music
- URL: http://arxiv.org/abs/2308.05496v1
- Date: Thu, 10 Aug 2023 10:59:24 GMT
- Title: Exploring XAI for the Arts: Explaining Latent Space in Generative Music
- Authors: Nick Bryan-Kinns, Berker Banar, Corey Ford, Courtney N. Reed, Yixiao
Zhang, Simon Colton, Jack Armitage
- Abstract summary: We show how a latent variable model for music generation can be made more explainable.
We use latent space regularisation to force some specific dimensions of the latent space to map to meaningful musical attributes.
We also provide a visualisation of the musical attributes in the latent space to help people understand and predict the effect of changes to latent space dimensions.
- Score: 5.91328657300926
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Explainable AI has the potential to support more interactive and fluid
co-creative AI systems which can creatively collaborate with people. To do
this, creative AI models need to be amenable to debugging by offering
eXplainable AI (XAI) features which are inspectable, understandable, and
modifiable. However, currently there is very little XAI for the arts. In this
work, we demonstrate how a latent variable model for music generation can be
made more explainable; specifically we extend MeasureVAE which generates
measures of music. We increase the explainability of the model by: i) using
latent space regularisation to force some specific dimensions of the latent
space to map to meaningful musical attributes, ii) providing a user interface
feedback loop to allow people to adjust dimensions of the latent space and
observe the results of these changes in real-time, iii) providing a
visualisation of the musical attributes in the latent space to help people
understand and predict the effect of changes to latent space dimensions. We
suggest that in doing so we bridge the gap between the latent space and the
generated musical outcomes in a meaningful way which makes the model and its
outputs more explainable and more debuggable.
Related papers
- AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation [60.5897687447003]
AvatarGO is a novel framework designed to generate realistic 4D HOI scenes from textual inputs.
Our framework not only generates coherent compositional motions, but also exhibits greater robustness in handling issues.
As the first attempt to synthesize 4D avatars with object interactions, we hope AvatarGO could open new doors for human-centric 4D content creation.
arXiv Detail & Related papers (2024-10-09T17:58:56Z) - A Survey of Foundation Models for Music Understanding [60.83532699497597]
This work is one of the early reviews of the intersection of AI techniques and music understanding.
We investigated, analyzed, and tested recent large-scale music foundation models in respect of their music comprehension abilities.
arXiv Detail & Related papers (2024-09-15T03:34:14Z) - Play Me Something Icy: Practical Challenges, Explainability and the Semantic Gap in Generative AI Music [0.0]
This pictorial aims to critically consider the nature of text-to-audio and text-to-music generative tools in the context of explainable AI.
arXiv Detail & Related papers (2024-08-13T22:42:05Z) - MuDiT & MuSiT: Alignment with Colloquial Expression in Description-to-Song Generation [18.181382408551574]
We propose a novel task of Colloquial Description-to-Song Generation.
It focuses on aligning the generated content with colloquial human expressions.
This task is aimed at bridging the gap between colloquial language understanding and auditory expression within an AI model.
arXiv Detail & Related papers (2024-07-03T15:12:36Z) - Exploring Variational Auto-Encoder Architectures, Configurations, and
Datasets for Generative Music Explainable AI [7.391173255888337]
Generative AI models for music and the arts are increasingly complex and hard to understand.
One approach to making generative AI models more understandable is to impose a small number of semantically meaningful attributes on generative AI models.
This paper contributes a systematic examination of the impact that different combinations of Variational Auto-Encoder models (MeasureVAE and AdversarialVAE) have on music generation performance.
arXiv Detail & Related papers (2023-11-14T17:27:30Z) - Beyond Reality: The Pivotal Role of Generative AI in the Metaverse [98.1561456565877]
This paper offers a comprehensive exploration of how generative AI technologies are shaping the Metaverse.
We delve into the applications of text generation models like ChatGPT and GPT-3, which are enhancing conversational interfaces with AI-generated characters.
We also examine the potential of 3D model generation technologies like Point-E and Lumirithmic in creating realistic virtual objects.
arXiv Detail & Related papers (2023-07-28T05:44:20Z) - Visual Affordance Prediction for Guiding Robot Exploration [56.17795036091848]
We develop an approach for learning visual affordances for guiding robot exploration.
We use a Transformer-based model to learn a conditional distribution in the latent embedding space of a VQ-VAE.
We show how the trained affordance model can be used for guiding exploration by acting as a goal-sampling distribution, during visual goal-conditioned policy learning in robotic manipulation.
arXiv Detail & Related papers (2023-05-28T17:53:09Z) - ArK: Augmented Reality with Knowledge Interactive Emergent Ability [115.72679420999535]
We develop an infinite agent that learns to transfer knowledge memory from general foundation models to novel domains.
The heart of our approach is an emerging mechanism, dubbed Augmented Reality with Knowledge Inference Interaction (ArK)
We show that our ArK approach, combined with large foundation models, significantly improves the quality of generated 2D/3D scenes.
arXiv Detail & Related papers (2023-05-01T17:57:01Z) - Spatial Steerability of GANs via Self-Supervision from Discriminator [123.27117057804732]
We propose a self-supervised approach to improve the spatial steerability of GANs without searching for steerable directions in the latent space.
Specifically, we design randomly sampled Gaussian heatmaps to be encoded into the intermediate layers of generative models as spatial inductive bias.
During inference, users can interact with the spatial heatmaps in an intuitive manner, enabling them to edit the output image by adjusting the scene layout, moving, or removing objects.
arXiv Detail & Related papers (2023-01-20T07:36:29Z) - Flat latent manifolds for music improvisation between human and machine [9.571383193449648]
We consider a music-generating algorithm as a counterpart to a human musician, in a setting where reciprocal improvisation is to lead to new experiences.
In the learned model, we generate novel musical sequences by quantification in latent space.
We provide empirical evidence for our method via a set of experiments on music and we deploy our model for an interactive jam session with a professional drummer.
arXiv Detail & Related papers (2022-02-23T09:00:17Z) - Learning Style-Aware Symbolic Music Representations by Adversarial
Autoencoders [9.923470453197657]
We focus on leveraging adversarial regularization as a flexible and natural mean to imbue variational autoencoders with context information.
We introduce the first Music Adversarial Autoencoder (MusAE)
Our model has a higher reconstruction accuracy than state-of-the-art models based on standard variational autoencoders.
arXiv Detail & Related papers (2020-01-15T18:07:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.