Exploring XAI for the Arts: Explaining Latent Space in Generative Music
- URL: http://arxiv.org/abs/2308.05496v1
- Date: Thu, 10 Aug 2023 10:59:24 GMT
- Title: Exploring XAI for the Arts: Explaining Latent Space in Generative Music
- Authors: Nick Bryan-Kinns, Berker Banar, Corey Ford, Courtney N. Reed, Yixiao
Zhang, Simon Colton, Jack Armitage
- Abstract summary: We show how a latent variable model for music generation can be made more explainable.
We use latent space regularisation to force some specific dimensions of the latent space to map to meaningful musical attributes.
We also provide a visualisation of the musical attributes in the latent space to help people understand and predict the effect of changes to latent space dimensions.
- Score: 5.91328657300926
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Explainable AI has the potential to support more interactive and fluid
co-creative AI systems which can creatively collaborate with people. To do
this, creative AI models need to be amenable to debugging by offering
eXplainable AI (XAI) features which are inspectable, understandable, and
modifiable. However, currently there is very little XAI for the arts. In this
work, we demonstrate how a latent variable model for music generation can be
made more explainable; specifically we extend MeasureVAE which generates
measures of music. We increase the explainability of the model by: i) using
latent space regularisation to force some specific dimensions of the latent
space to map to meaningful musical attributes, ii) providing a user interface
feedback loop to allow people to adjust dimensions of the latent space and
observe the results of these changes in real-time, iii) providing a
visualisation of the musical attributes in the latent space to help people
understand and predict the effect of changes to latent space dimensions. We
suggest that in doing so we bridge the gap between the latent space and the
generated musical outcomes in a meaningful way which makes the model and its
outputs more explainable and more debuggable.
Related papers
- MuDiT & MuSiT: Alignment with Colloquial Expression in Description-to-Song Generation [18.181382408551574]
We propose a novel task of Colloquial Description-to-Song Generation.
It focuses on aligning the generated content with colloquial human expressions.
This task is aimed at bridging the gap between colloquial language understanding and auditory expression within an AI model.
arXiv Detail & Related papers (2024-07-03T15:12:36Z) - Hawk: Learning to Understand Open-World Video Anomalies [76.9631436818573]
Video Anomaly Detection (VAD) systems can autonomously monitor and identify disturbances, reducing the need for manual labor and associated costs.
We introduce Hawk, a novel framework that leverages interactive large Visual Language Models (VLM) to interpret video anomalies precisely.
We have annotated over 8,000 anomaly videos with language descriptions, enabling effective training across diverse open-world scenarios, and also created 8,000 question-answering pairs for users' open-world questions.
arXiv Detail & Related papers (2024-05-27T07:08:58Z) - Exploring Variational Auto-Encoder Architectures, Configurations, and
Datasets for Generative Music Explainable AI [7.391173255888337]
Generative AI models for music and the arts are increasingly complex and hard to understand.
One approach to making generative AI models more understandable is to impose a small number of semantically meaningful attributes on generative AI models.
This paper contributes a systematic examination of the impact that different combinations of Variational Auto-Encoder models (MeasureVAE and AdversarialVAE) have on music generation performance.
arXiv Detail & Related papers (2023-11-14T17:27:30Z) - Beyond Reality: The Pivotal Role of Generative AI in the Metaverse [98.1561456565877]
This paper offers a comprehensive exploration of how generative AI technologies are shaping the Metaverse.
We delve into the applications of text generation models like ChatGPT and GPT-3, which are enhancing conversational interfaces with AI-generated characters.
We also examine the potential of 3D model generation technologies like Point-E and Lumirithmic in creating realistic virtual objects.
arXiv Detail & Related papers (2023-07-28T05:44:20Z) - Visual Affordance Prediction for Guiding Robot Exploration [56.17795036091848]
We develop an approach for learning visual affordances for guiding robot exploration.
We use a Transformer-based model to learn a conditional distribution in the latent embedding space of a VQ-VAE.
We show how the trained affordance model can be used for guiding exploration by acting as a goal-sampling distribution, during visual goal-conditioned policy learning in robotic manipulation.
arXiv Detail & Related papers (2023-05-28T17:53:09Z) - ArK: Augmented Reality with Knowledge Interactive Emergent Ability [115.72679420999535]
We develop an infinite agent that learns to transfer knowledge memory from general foundation models to novel domains.
The heart of our approach is an emerging mechanism, dubbed Augmented Reality with Knowledge Inference Interaction (ArK)
We show that our ArK approach, combined with large foundation models, significantly improves the quality of generated 2D/3D scenes.
arXiv Detail & Related papers (2023-05-01T17:57:01Z) - Spatial Steerability of GANs via Self-Supervision from Discriminator [123.27117057804732]
We propose a self-supervised approach to improve the spatial steerability of GANs without searching for steerable directions in the latent space.
Specifically, we design randomly sampled Gaussian heatmaps to be encoded into the intermediate layers of generative models as spatial inductive bias.
During inference, users can interact with the spatial heatmaps in an intuitive manner, enabling them to edit the output image by adjusting the scene layout, moving, or removing objects.
arXiv Detail & Related papers (2023-01-20T07:36:29Z) - Flat latent manifolds for music improvisation between human and machine [9.571383193449648]
We consider a music-generating algorithm as a counterpart to a human musician, in a setting where reciprocal improvisation is to lead to new experiences.
In the learned model, we generate novel musical sequences by quantification in latent space.
We provide empirical evidence for our method via a set of experiments on music and we deploy our model for an interactive jam session with a professional drummer.
arXiv Detail & Related papers (2022-02-23T09:00:17Z) - AI Song Contest: Human-AI Co-Creation in Songwriting [8.399688944263843]
We present findings on what 13 musician/developer teams, a total of 61 users, needed when co-creating a song with AI.
We show how they leveraged and repurposed existing characteristics of AI to overcome some of these challenges.
Findings reflect a need to design machine learning-powered music interfaces that are more decomposable, steerable, interpretable, and adaptive.
arXiv Detail & Related papers (2020-10-12T01:27:41Z) - Explainable Active Learning (XAL): An Empirical Study of How Local
Explanations Impact Annotator Experience [76.9910678786031]
We propose a novel paradigm of explainable active learning (XAL), by introducing techniques from the recently surging field of explainable AI (XAI) into an Active Learning setting.
Our study shows benefits of AI explanation as interfaces for machine teaching--supporting trust calibration and enabling rich forms of teaching feedback, and potential drawbacks--anchoring effect with the model judgment and cognitive workload.
arXiv Detail & Related papers (2020-01-24T22:52:18Z) - Learning Style-Aware Symbolic Music Representations by Adversarial
Autoencoders [9.923470453197657]
We focus on leveraging adversarial regularization as a flexible and natural mean to imbue variational autoencoders with context information.
We introduce the first Music Adversarial Autoencoder (MusAE)
Our model has a higher reconstruction accuracy than state-of-the-art models based on standard variational autoencoders.
arXiv Detail & Related papers (2020-01-15T18:07:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.