Learning Unsupervised Hierarchies of Audio Concepts
- URL: http://arxiv.org/abs/2207.11231v1
- Date: Thu, 21 Jul 2022 16:34:31 GMT
- Title: Learning Unsupervised Hierarchies of Audio Concepts
- Authors: Darius Afchar, Romain Hennequin and Vincent Guigue
- Abstract summary: In computer vision, concept learning was proposed to adjust explanations to the right abstraction level.
In this paper, we adapt concept learning to the realm of music, with its particularities.
We propose a method to learn numerous music concepts from audio and then automatically hierarchise them to expose their mutual relationships.
- Score: 13.400413055847084
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Music signals are difficult to interpret from their low-level features,
perhaps even more than images: e.g. highlighting part of a spectrogram or an
image is often insufficient to convey high-level ideas that are genuinely
relevant to humans. In computer vision, concept learning was therein proposed
to adjust explanations to the right abstraction level (e.g. detect clinical
concepts from radiographs). These methods have yet to be used for MIR.
In this paper, we adapt concept learning to the realm of music, with its
particularities. For instance, music concepts are typically non-independent and
of mixed nature (e.g. genre, instruments, mood), unlike previous work that
assumed disentangled concepts. We propose a method to learn numerous music
concepts from audio and then automatically hierarchise them to expose their
mutual relationships. We conduct experiments on datasets of playlists from a
music streaming service, serving as a few annotated examples for diverse
concepts. Evaluations show that the mined hierarchies are aligned with both
ground-truth hierarchies of concepts -- when available -- and with proxy
sources of concept similarity in the general case.
Related papers
- Explainable Concept Generation through Vision-Language Preference Learning [7.736445799116692]
Concept-based explanations have become a popular choice for explaining deep neural networks post-hoc.
We devise a reinforcement learning-based preference optimization algorithm that fine-tunes the vision-language generative model.
In addition to showing the efficacy and reliability of our method, we show how our method can be used as a diagnostic tool for analyzing neural networks.
arXiv Detail & Related papers (2024-08-24T02:26:42Z) - Pre-trained Vision-Language Models Learn Discoverable Visual Concepts [33.302556000017844]
We aim to answer this question as visual concepts learned "for free" would enable wide applications.
We assume that the visual concepts, if captured by pre-trained VLMs, can be extracted by their vision-language interface with text-based concept prompts.
Our proposed concept discovery and learning framework is thus designed to identify a diverse list of generic visual concepts.
arXiv Detail & Related papers (2024-04-19T06:41:32Z) - Structuring Concept Space with the Musical Circle of Fifths by Utilizing Music Grammar Based Activations [0.0]
We explore the intriguing similarities between the structure of a discrete neural network, such as a spiking network, and the composition of a piano piece.
We propose a novel approach that leverages musical grammar to regulate activations in a spiking neural network.
We show that the map of concepts in our model is structured by the musical circle of fifths, highlighting the potential for leveraging music theory principles in deep learning algorithms.
arXiv Detail & Related papers (2024-02-22T03:28:25Z) - Concept-Based Techniques for "Musicologist-friendly" Explanations in a
Deep Music Classifier [5.442298461804281]
We focus on more human-friendly explanations based on high-level musical concepts.
Our research targets trained systems (post-hoc explanations) and explores two approaches.
We demonstrate both techniques on an existing symbolic composer classification system, showcase their potential, and highlight their intrinsic limitations.
arXiv Detail & Related papers (2022-08-26T07:45:29Z) - Static and Dynamic Concepts for Self-supervised Video Representation
Learning [70.15341866794303]
We propose a novel learning scheme for self-supervised video representation learning.
Motivated by how humans understand videos, we propose to first learn general visual concepts then attend to discriminative local areas for video understanding.
arXiv Detail & Related papers (2022-07-26T10:28:44Z) - ConceptBeam: Concept Driven Target Speech Extraction [69.85003619274295]
We propose a novel framework for target speech extraction based on semantic information, called ConceptBeam.
In our scheme, a concept is encoded as a semantic embedding by mapping the concept specifier to a shared embedding space.
We use it to bridge modality-dependent information, i.e., the speech segments in the mixture, and the specified, modality-independent concept.
arXiv Detail & Related papers (2022-07-25T08:06:07Z) - Visual Superordinate Abstraction for Robust Concept Learning [80.15940996821541]
Concept learning constructs visual representations that are connected to linguistic semantics.
We ascribe the bottleneck to a failure of exploring the intrinsic semantic hierarchy of visual concepts.
We propose a visual superordinate abstraction framework for explicitly modeling semantic-aware visual subspaces.
arXiv Detail & Related papers (2022-05-28T14:27:38Z) - Visual Concepts Tokenization [65.61987357146997]
We propose an unsupervised transformer-based Visual Concepts Tokenization framework, dubbed VCT, to perceive an image into a set of disentangled visual concept tokens.
To obtain these concept tokens, we only use cross-attention to extract visual information from the image tokens layer by layer without self-attention between concept tokens.
We further propose a Concept Disentangling Loss to facilitate that different concept tokens represent independent visual concepts.
arXiv Detail & Related papers (2022-05-20T11:25:31Z) - Contrastive Learning with Positive-Negative Frame Mask for Music
Representation [91.44187939465948]
This paper proposes a novel Positive-nEgative frame mask for Music Representation based on the contrastive learning framework, abbreviated as PEMR.
We devise a novel contrastive learning objective to accommodate both self-augmented positives/negatives sampled from the same music.
arXiv Detail & Related papers (2022-03-17T07:11:42Z) - Towards Visual Semantics [17.1623244298824]
We study how humans build mental representations, i.e., concepts, of what they visually perceive.
In this paper we provide a theory and an algorithm which learns substance concepts which correspond to the concepts, that we call classification concepts.
The experiments, though preliminary, show that the algorithm manages to acquire the notions of Genus and Differentia with reasonable accuracy.
arXiv Detail & Related papers (2021-04-26T07:28:02Z) - Visual Concept-Metaconcept Learning [101.62725114966211]
We propose the visual concept-metaconcept learner (VCML) for joint learning of concepts and metaconcepts from images and associated question-answer pairs.
Knowing that red and green describe the same property of objects, we generalize to the fact that cube and sphere also describe the same property of objects.
arXiv Detail & Related papers (2020-02-04T18:42:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.