Music Boundary Detection using Convolutional Neural Networks: A
comparative analysis of combined input features
- URL: http://arxiv.org/abs/2008.07527v2
- Date: Wed, 1 Dec 2021 15:01:19 GMT
- Title: Music Boundary Detection using Convolutional Neural Networks: A
comparative analysis of combined input features
- Authors: Carlos Hernandez-Olivan, Jose R. Beltran, David Diaz-Guerra
- Abstract summary: The analysis of the structure of musical pieces is a task that remains a challenge for Artificial Intelligence.
We establish a general method of pre-processing these inputs by comparing the inputs calculated from different pooling strategies.
We also establish the most effective combination of inputs to be delivered to the CNN in order to establish the most efficient way to extract the limits of the structure of the music pieces.
- Score: 2.123556187010023
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The analysis of the structure of musical pieces is a task that remains a
challenge for Artificial Intelligence, especially in the field of Deep
Learning. It requires prior identification of structural boundaries of the
music pieces. This structural boundary analysis has recently been studied with
unsupervised methods and \textit{end-to-end} techniques such as Convolutional
Neural Networks (CNN) using Mel-Scaled Log-magnitude Spectograms features
(MLS), Self-Similarity Matrices (SSM) or Self-Similarity Lag Matrices (SSLM) as
inputs and trained with human annotations. Several studies have been published
divided into unsupervised and \textit{end-to-end} methods in which
pre-processing is done in different ways, using different distance metrics and
audio characteristics, so a generalized pre-processing method to compute model
inputs is missing. The objective of this work is to establish a general method
of pre-processing these inputs by comparing the inputs calculated from
different pooling strategies, distance metrics and audio characteristics, also
taking into account the computing time to obtain them. We also establish the
most effective combination of inputs to be delivered to the CNN in order to
establish the most efficient way to extract the limits of the structure of the
music pieces. With an adequate combination of input matrices and pooling
strategies we obtain a measurement accuracy $F_1$ of 0.411 that outperforms the
current one obtained under the same conditions.
Related papers
- PREMAP: A Unifying PREiMage APproximation Framework for Neural Networks [30.701422594374456]
We present a framework for preimage abstraction that produces under- and over-approximations of any polyhedral output set.
We evaluate our method on a range of tasks, demonstrating significant improvement in efficiency and scalability to high-input-dimensional image classification tasks.
arXiv Detail & Related papers (2024-08-17T17:24:47Z) - Automatic Input Feature Relevance via Spectral Neural Networks [0.9236074230806581]
We propose a novel method to estimate the relative importance of the input components for a Deep Neural Network.
This is achieved by leveraging on a spectral re-parametrization of the optimization process.
The technique is successfully challenged against both synthetic and real data.
arXiv Detail & Related papers (2024-06-03T10:39:12Z) - Discrete Neural Algorithmic Reasoning [18.497863598167257]
We propose to force neural reasoners to maintain the execution trajectory as a combination of finite predefined states.
trained with supervision on the algorithm's state transitions, such models are able to perfectly align with the original algorithm.
arXiv Detail & Related papers (2024-02-18T16:03:04Z) - Incrementally-Computable Neural Networks: Efficient Inference for
Dynamic Inputs [75.40636935415601]
Deep learning often faces the challenge of efficiently processing dynamic inputs, such as sensor data or user inputs.
We take an incremental computing approach, looking to reuse calculations as the inputs change.
We apply this approach to the transformers architecture, creating an efficient incremental inference algorithm with complexity proportional to the fraction of modified inputs.
arXiv Detail & Related papers (2023-07-27T16:30:27Z) - Task-Oriented Sensing, Computation, and Communication Integration for
Multi-Device Edge AI [108.08079323459822]
This paper studies a new multi-intelligent edge artificial-latency (AI) system, which jointly exploits the AI model split inference and integrated sensing and communication (ISAC)
We measure the inference accuracy by adopting an approximate but tractable metric, namely discriminant gain.
arXiv Detail & Related papers (2022-07-03T06:57:07Z) - Fast accuracy estimation of deep learning based multi-class musical
source separation [79.10962538141445]
We propose a method to evaluate the separability of instruments in any dataset without training and tuning a neural network.
Based on the oracle principle with an ideal ratio mask, our approach is an excellent proxy to estimate the separation performances of state-of-the-art deep learning approaches.
arXiv Detail & Related papers (2020-10-19T13:05:08Z) - Data-Driven Symbol Detection via Model-Based Machine Learning [117.58188185409904]
We review a data-driven framework to symbol detection design which combines machine learning (ML) and model-based algorithms.
In this hybrid approach, well-known channel-model-based algorithms are augmented with ML-based algorithms to remove their channel-model-dependence.
Our results demonstrate that these techniques can yield near-optimal performance of model-based algorithms without knowing the exact channel input-output statistical relationship.
arXiv Detail & Related papers (2020-02-14T06:58:27Z) - Depthwise Non-local Module for Fast Salient Object Detection Using a
Single Thread [136.2224792151324]
We propose a new deep learning algorithm for fast salient object detection.
The proposed algorithm achieves competitive accuracy and high inference efficiency simultaneously with a single CPU thread.
arXiv Detail & Related papers (2020-01-22T15:23:48Z) - Synthetic Datasets for Neural Program Synthesis [66.20924952964117]
We propose a new methodology for controlling and evaluating the bias of synthetic data distributions over both programs and specifications.
We demonstrate, using the Karel DSL and a small Calculator DSL, that training deep networks on these distributions leads to improved cross-distribution generalization performance.
arXiv Detail & Related papers (2019-12-27T21:28:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.